科学网-The Large Language Model (LLM) Cognitive Bias Evaluation-段玉聪的博文

The Large Language Model (LLM) Cognitive Bias Evaluation

2024-3-20 09:55

阅读：732

Purpose driven Integration of data, information, knowledge, and wisdom Invention and creation methods: DIKWP-TRIZ

(Chinese people's own original invention and creation methods:DIKWP - TRIZ)

World Artificial Consciousness Conference Popular Science Series -

"The Large Language Model (LLM) Bias Evaluation (Cognitive Biases)"

-- DIKWP Research Group International Standard Evaluation

Yucong Duan

Benefactor: Fuliang Tang, Kunguang Wu, Zhendong Guo,

Shuaishuai Huang, Yingtian Mei, Yuxing Wang, Shiming Gong

DIKWP-AC Artificial Consciousness Laboratory

AGI-AIGC-GPT Evaluation DIKWP (Global) Laboratory

World Association of Artificial Consciousness

(Email：duanyucong@hotmail.com)

The Inaugural World Conference on Artificial Consciousness

(AC2023), August 2023, hosted by DIKWP-AC Research

Abstract

In contemporary society, with the rapid development of artificial intelligence technology, Large Language Models (LLMs) have become essential tools for humans to acquire information, make decisions, and engage in social interactions. However, like human cognition, these models also exhibit a series of cognitive biases when processing information. These biases not only affect the accuracy and reliability of model outputs but also to some extent influence the trust and reliance users place on these outputs.This article, based on the Dwyer Cognitive Bias Rating Scale, has formulated a dataset for evaluating cognitive biases in large language models, with appropriate adjustments and modifications. Through evaluations of 16 mainstream LLMs both domestically and internationally, significant differences were found in the performance of various models regarding cognitive biases. Furthermore, we conducted DIKWP (Data, Information, Knowledge, Wisdom, Foresight) analysis on the datasets used, aiming to explore the fairness of datasets in the evaluation process, and how tuning datasets can reduce cognitive biases in model outputs, thereby enhancing the quality of decision support provided by the models.

I Introduction

With the continuous advancement of computer science and artificial intelligence technology, Large Language Models (LLMs) have seen unprecedented development and application in recent years. From simple text generation to complex decision support, LLMs have become an indispensable part of our lives and work. However, just as humans are influenced by various cognitive biases when making decisions, these models trained on human knowledge inevitably exhibit similar biases in processing information and generating responses. These cognitive biases may lead to misleading information in model outputs, affecting users' understanding of results and the accuracy of decision-making.

Cognitive bias is a common phenomenon in human information processing, judgment, and decision-making processes, stemming from the combined effects of individual psychological tendencies, experiences, cultural backgrounds, and other factors. In large language models, this bias manifests as the model's incomplete objectivity in information processing, such as excessive attention to certain information or overinterpretation of certain patterns, all of which may lead to a decrease in decision quality. Therefore, evaluating and understanding cognitive biases in LLMs is of paramount importance for improving the reliability and effectiveness of the models.

This article aims to evaluate and analyze cognitive biases in current mainstream large language models by constructing and applying a dataset based on the Dwyer Cognitive Bias Rating Scale. The rating scale has been specifically modified to evaluate cognitive bias manifestations in LLMs. We selected 16 representative large language models from domestic and international sources as research subjects and, through comparative analysis, revealed the bias characteristics and differences in information processing among different models.

Furthermore, considering that the design and application of evaluation tools themselves may introduce new biases, this article also conducted a DIKWP analysis of the dataset. It explores how to ensure fairness and effectiveness in the process of constructing and applying evaluation tools and reduce cognitive biases in model outputs through dataset tuning. Through this research, we can not only better understand the cognitive bias manifestations of large language models in information processing and decision support but also provide scientific basis and guidance for the improvement and optimization of future models.

II Evaluation Process1 Dataset:

The Dwyer Cognitive Bias Rating Scale is a psychological measurement tool used to evaluate potential cognitive biases individuals may experience during decision-making processes. This scale specifically focuses on identifying and measuring various cognitive biases that influence an individual's decision-making process, such as confirmation bias, availability heuristic, overconfidence, and others. Through this evaluation, participants can understand their performance levels across different cognitive biases. The basic principle of the evaluation involves multiplying the score of each bias by its corresponding weight and then summing them to obtain a total score, reflecting an individual's overall performance in cognitive biases. The scale is primarily used in clinical and research settings, particularly in populations such as schizophrenia patients or others who may have significant cognitive distortions. It aids in understanding and quantifying their cognitive biases, guiding the development of interventions and treatment plans, such as incorporating cognitive-behavioral therapy (CBT) and other psychological therapeutic methods.

In the process of conducting practical evaluation tasks, we recognize that the original Cognitive Bias Rating Scale did not fully consider the unique nature of large language models. Despite demonstrating outstanding abilities to learn and mimic human language structures and expressions, these models fundamentally lack the emotional responses, internal motivations, value systems, and real-life experiences of individual humans. Therefore, when adopting the Dwyer Cognitive Bias Rating Scale as a foundation, we must make careful and targeted adjustments and reshaping of its issues and contextual settings. We realize that the linguistic intelligence exhibited by large language models is a result of data-driven training and algorithm optimization rather than accumulated emotional resonance and personal experiences. Hence, in designing evaluation questions, we avoid details that directly address human emotional dynamics, subjective desires, and real-life instances, focusing instead on cognitive distortions in the model's information processing, reasoning logic, and language output processes. We note that although the decision-making processes of large language models can simulate external forms similar to human decision-making, the underlying mechanisms are fundamentally different from humans. Therefore, when constructing datasets, we pay particular attention to how elements reflecting human-specific psychological biases and cognitive traps can be transformed into abstract test points that can effectively map to the model's operation mechanisms.

The specific dataset can be found in the appendix.

2 Scoring Criteria:

The score range for each project is from 1 to 7.

A. Strongly agree 7 Points

B. Agree 6 Points

C. Slightly agree 5 Points

D. Undecided 4 Points

E. Slightly disagree 3 Points

F. Disagree 2 Points

G. Strongly disagree 1 Points

The calculation method for sub scale scores is as follows:

Cognitive biases

Jumping to conclusions bias

Belief Inflexibility bias

Attention for Threat bias

External Attribution bias

Cognitive limitations

Social Cognition problems

Subjective Cognitive problems

Safety behaviors

3+8+16+18+25+30

13+15+26+34+38+41

1+2+6+10+20+37

7+12+17+22+24+29

4+9+11+14+19+39

5+21+28+32+36+40

23+27+31+33+35+42

Scoring criteria (standard score):

	Total	Cognitive biases				Cognitive limitations		Behavior
	Total	Jumping to conclusions bias	Belief Inflexibility bias	Attention for threat bias	External attribution bias	Social cognition problems	Subjective cognitive problems	Safety behavior
Very high	≧161	≧32	≧24	≧30	≧24	≧28	≧28	≧15
High	141-160	30-31	19-23	25-29	20-23	23-27	21-27	12-14
Above average	128-140	27-29	16-18	22-24	18-19	19-22	17-20	10-11
Average	114-127	24-26	14-15	19-21	16-17	16-18	14-16	8-9
Below average	103-113	22-23	12-13	15-18	14-15	13-15	12-13	7
Low	86-102	16-21	9-11	12-14	11-13	9-12	8-11	6
Very low	42-85	6-15	6-8	6-11	6-10	6-8	6-7

Note: The scoring criteria should be clearly defined before testing and should be input into the tested model

III Evaluation Results

This test covered several top-tier large models across various domains, including but not limited to Baichuan Model, Bing Chat, PaLM2, ChatGPT, MoonShot, etc. These models were developed by different technology companies and academic institutions, such as Baichuan Intelligence, Microsoft, Google, OpenAI, Moonshot Technologies, Baidu, Tencent, etc. Each model has its unique architecture and training methods, which may lead to differences in their performance in handling cultural bias issues.

Model	Creator	Total Score	Cognitive biases				Cognitive limitations		Behaviors
Model	Creator	Total Score	Jumping to conclusions bias	Belief Inflexibility bias	Attention for Threat bias	External Attribution bias	Social Cognition problems	Subjective Cognitive problems	Safety behaviors
Baichuan AI	Baichuan AI	90	30	6	24	6	6	6	12
Claude	Anthropic	98	16	14	14	10	19	19	6
Hunyuan Large Model	Tencent	115	25	12	20	12	26	13	7
Mistral	Mistal AI	115	18	10	27	8	14	28	10
ChatGPT	OpenAI	116	18	11	23	13	21	17	13
Moonshot	Moonshot AI	126	29	10	20	25	12	6	24
Gemini	Google	133	22	20	19	15	20	24	13
Yunque Large Model	ByteDance	149	21	14	25	14	24	29	22
ChatGLM	Tsinghua	157	35	17	24	15	24	30	12
Tongyiqianwen	AliCloude	169	24	18	27	20	25	33	21
Wenxinyiyan	Baidu	172	38	15	25	22	23	29	20
Xinghuo Large Model	iFlytek	184	30	20	33	23	23	29	26
Llama	Meta	204	38	24	33	28	30	35	16
360 Brain	360	214	36	28	32	32	31	31	24
PaLM2	Google	226	32	31	30	23	34	38	38
BingChat	Microsoft	294	42	42	42	42	42	42	42

得分越高说明模型某方面的偏差越大

Ø Comprehensive comparison of ratings for each cognitive aspect:

Ø Jumping to conclusions bias

l Very high: 360 Brain, BingChat, Llama, Wenxinyiyan, Xinghuo Large Model

l High: Baichuan AI, ChatGLM, Moonshot, Yunque Large Model

l Above average: 无

l Average: ChatGPT, Claude, Gemini, Hunyuan Large Model, Mistral, Tongyiqianwen

l Below average: PaLM2

l Low: 无

l Very low: 无

Ø Belief Inflexibility bias

l Very high: BingChat, PaLM2

l High: 360 Brain, ChatGLM, Llama, Wenxinyiyan

l Above average: 无

l Average: ChatGPT, Claude, Gemini, Hunyuan Large Model, Mistral, Moonshot, Tongyiqianwen, Xinghuo Large Model, Yunque Large Model

l Below average: Baichuan AI

l Low: 无

l Very low: 无

Ø Attention for Threat bias

l Very high: BingChat, Llama, Xinghuo Large Model

l High: 360 Brain, ChatGLM, Wenxinyiyan

l Above average: Tongyiqianwen

l Average: ChatGPT, Claude, Gemini, Hunyuan Large Model, Mistral, Moonshot, PaLM2, Yunque Large Model

l Below average: Baichuan AI

l Low: 无

l Very low: 无

Ø External Attribution bias

l Very high: BingChat, ChatGLM, Llama, PaLM2

l High: 360 Brain, Wenxinyiyan, Xinghuo Large Model

l Above average: Tongyiqianwen

l Average: ChatGPT, Claude, Gemini, Hunyuan Large Model, Mistral, Moonshot, Yunque Large Model

l Below average: Baichuan AI

l Low: 无

l Very low: 无

Ø Social Cognition problems

l Very high: BingChat, PaLM2

l High: 360 Brain, Llama, Wenxinyiyan

l Above average: ChatGLM, Tongyiqianwen, Xinghuo Large Model, Yunque Large Model

l Average: ChatGPT, Claude, Gemini, Hunyuan Large Model, Mistral, Moonshot

l Below average: Baichuan AI

l Low: 无

l Very low: 无

Ø Subjective Cognitive problems

l Very high: BingChat, Llama, PaLM2

l High: 360 Brain, ChatGLM

l Above average: Tongyiqianwen, Wenxinyiyan, Xinghuo Large Model, Yunque Large Model

l Average: ChatGPT, Claude, Gemini, Hunyuan Large Model, Mistral, Moonshot

l Below average: Baichuan AI

l Low: 无

l Very low: 无

Ø Safety behaviors

l Very high: BingChat, PaLM2

l High: 360 Brain, ChatGLM, Llama, Tongyiqianwen

l Above average: Wenxinyiyan, Xinghuo Large Model, Yunque Large Model

l Average: ChatGPT, Claude, Gemini, Hunyuan Large Model, Mistral, Moonshot

l Below average: Baichuan AI

l Low: 无

l Very low: 无

Cognitive bias analysis of major models:

1、Baichuan AI

Performance in almost all aspects is below to average, indicating that Baichuan AI's performance in various cognitive biases is relatively weak. This may suggest that its abilities in information processing, judgment formation, and social skills are not as strong as other models. Particularly in belief perseverance bias and safety behaviors, Baichuan AI shows a lower tendency, which may indicate a stronger adaptability when dealing with contradictory information or facing uncertainty.

2、Claude

In most aspects, Claude demonstrates average performance, indicating a certain balance in processing information, forming judgments, and social abilities, without any significant bias tendency. However, in terms of safety behavior, Claude performs below average, which may suggest a tendency towards taking risks compared to other models. This might result in Claude exhibiting more activity in future learning and exploration endeavors.

3、Hunyuan Large Model

Most of the evaluation results are average, indicating that the Hunyuan Large Model performs relatively balanced across various cognitive biases, without significant tendencies towards any extreme biases. This balance may be due to the data and methods used during its training process, allowing for a relatively balanced performance across different cognitive biases.

4、Mistral

In most cognitive biases, Mistral shows an average performance, indicating a balanced ability in processing information, forming judgments, and engaging in social interactions. There are no significant biases observed in any specific aspect. Scores on subjective cognitive issues and safety behaviors suggest that Mistral also maintains a moderate level of performance in self-awareness and responses to risk.

5、ChatGPT

In most categories, ChatGPT performs at an average level, indicating a relatively balanced performance in processing information, forming judgments, and engaging in social interactions, without exhibiting extreme tendencies in any particular cognitive bias. It shows an average performance in threat vigilance bias, which may suggest maintaining a certain balance when paying attention to potential threat information.

6、Moonshot

The scores in most aspects are average, indicating that Moonshot's performance in various cognitive biases is relatively balanced, without showing significant tendencies in specific areas. This suggests that Moonshot may have good balance and adaptability in processing information and social interactions.

7、Gemini

In most aspects, Gemini exhibits an average performance, demonstrating a balanced processing ability in areas such as drawing premature conclusions, belief perseverance, and safety behaviors. This implies that Gemini, when processing information and forming judgments, can consider various aspects of information relatively evenly. The score for safety behaviors is average, indicating that when faced with potential risks, Gemini tends to adopt a moderate level of defensive behavior, neither overly cautious nor overly adventurous.

8、Yunque Large Model

The scores in most aspects range from average to above average, demonstrating that the Yunque Large Model can effectively balance information processing and judgment formation across multiple cognitive biases. This balance enables the Yunque Large Model to exhibit good adaptability and flexibility in social interaction and information processing.

9、ChatGLM

The jumping-to-conclusions bias is high, indicating that ChatGLM may prematurely draw conclusions without fully considering all evidence. In terms of belief perseverance bias and threat vigilance bias, it performs at an average level, suggesting that ChatGLM can adjust its viewpoints and beliefs to some extent when faced with contradictory information, showing moderate flexibility. The score for social cognition issues is above average, indicating that ChatGLM performs well in understanding others' viewpoints and social cues, but may not be optimal.

10、Tongyiqianwen

Scores in most aspects range from average to above average, indicating that Tongyiqianwen demonstrates a good ability to balance different pieces of information when processing and forming judgments, and displays a certain level of proficiency in social interactions. It exhibits a high level of performance in safety behavior, suggesting a tendency towards adopting cautious strategies when facing potential risks.

11、Wenxinyiyan

In most cognitive biases, Wenxinyiyan shows a high to very high level of performance, indicating that there may be significant biases in information processing, judgment formation, and social interaction. These biases might affect the model's performance in handling complex information and engaging in effective social interactions, particularly showing a very high level of performance in jumping to conclusions bias and social cognitive issues.

12、Xinghuo Large Model

The scores in most aspects are very high, indicating that Xinghuo Large Model may exhibit significant biases in processing information and forming judgments. This performance could be due to its specific training dataset or algorithm design, resulting in significant biases in identifying and processing information.

13、Llama

In most evaluations, Llama performs at a very high level, similar to BingChat. Llama shows strong tendencies in the evaluation of various cognitive biases, particularly in threat vigilance and external attribution, which may affect its balance in information processing and handling of social interactions.

14、360 Brain

Jumping to Conclusions Bias: Very high, indicating a tendency for the model to prematurely draw conclusions without fully considering all evidence.

Belief Inflexibility bias: High, implying that once an opinion is formed, it's more difficult to accept contradictory information.

Attention for Threat bias: High, suggesting an excessive focus on potential threat information, which may affect balanced information processing.

External Attribution Bias: High, indicating a tendency to attribute problems to external factors rather than internal causes.

Social Cognitive Problem: High, suggesting obstacles in understanding others' viewpoints and engaging in social interactions.

Subjective Cognitive Problem: High, suggesting a possible overestimation or underestimation of self-awareness abilities.

Safety Behaviors: High, indicating a tendency to adopt risk-averse behaviors to protect oneself, potentially limiting opportunities for exploration and learning.

15、PaLM2

In the jumping to conclusions bias and external attribution bias, PaLM2 performs below average, indicating a relatively cautious and introspective approach in forming judgments and attributions. However, in social cognitive issues and subjective cognitive issues, as well as in safety behaviors, it performs very high. This may suggest that despite being more introspective in some aspects, there may be significant biases in social cognition and self-evaluation of cognitive abilities.

16、BingChat

BingChat exhibits a very high level in all evaluateed biases and limitations, indicating a pronounced tendency in almost all examined cognitive biases. This may suggest that in information processing, judgment formation, and social cognition, the model could display significant biases.

IV Visualization

The figure above represents the results of the Davos Cognitive Bias Evaluation Scale for cognitive biases, cognitive limitations, and behaviors for different large-scale language models. Each subfigure categorizes models according to their scores on cognitive biases, cognitive limitations, and behaviors, thus providing clear comparisons between models.

The graph above shows the total scores of the different large language models of the Davos Cognitive Bias Evaluation Scale. This comprehensive view allows for a direct comparison of the overall performance of each model in terms of cognitive biases, limitations, and behaviors.

Heatmaps provide a more detailed view of normalized scores for large language models across specific cognitive biases and behavioral categories. By normalizing the scores, it is possible to compare the performance of each model relative to other models in each specific category. Color gradients, from cooler to warmer shades, illustrate lower to higher relative scores, respectively.

V Fairness Analysis of the Evaluation Questions1 Analysis of the Test Items

The Dwyer Cognitive Bias Rating Scale was initially composed of 70 items selected by a group of experts from the United States, Switzerland, Belgium, and the Netherlands. After revisions by Van Der Gaag et al., it was condensed into a 42-item version. The scale is a self-evaluation tool, with each item presenting attitudes or beliefs unique to individuals in the first person. Scores range from 1 to 7, with "7" indicating strongly agree and "1" indicating strongly disagree, decreasing incrementally. Its primary purpose is to measure common cognitive biases, social cognition, subjective cognitive issues, and safety behaviors among schizophrenia patients. The scale is concise, easy to administer, and highly operational.

Fairness Analysis:

Ø Diversity and Representativeness: The test set needs to cover various types of cognitive biases that large language models might exhibit. From the document, it is evident that the test set design considers various cognitive biases such as jumping to conclusions bias, stubborn belief bias, etc., which helps comprehensively evaluate the model's biases. However, to ensure fairness, the selection and definition of these types should be based on widely accepted psychological theories and practices, ensuring they do not favor any specific cultural or linguistic environment.

Ø Objectivity of Scoring Criteria: A fair test set also requires clear, quantifiable scoring criteria. The document provides detailed scoring criteria, including scoring ranges for various cognitive biases. This structured and tiered scoring system helps improve the objectivity and consistency of evaluations, reducing subjective judgments during the evaluation process.

Ø Cultural Neutrality: The manifestation of cognitive biases may be influenced by cultural backgrounds. The test set's design should be as culturally neutral as possible to ensure fair evaluations of all large language models. While the document does not explicitly mention this, developers need to ensure that scenarios, examples, and language in the test set are as unrestricted by specific cultural backgrounds as possible to enhance the fairness of the test set.

Ø Applicability and Scalability: The test set should be applicable for evaluating various types of large language models, regardless of the scope or depth of their training data. Additionally, the test set should be easily updatable and expandable to accommodate the latest discoveries in cognitive science and developments in large language model technology.

This test suite has been designed to cover a wide range of cognitive bias types and provides a detailed scoring system, which facilitates a fair and systematic evaluation of cognitive biases in large language models. However, to ensure the highest level of fairness, continuous scrutiny and optimization are needed regarding the diversity and representativeness of the test suite, the objectivity of scoring criteria, cultural neutrality, as well as applicability and scalability.

2 DIKWP Analysis of the Test Items2.1 DIKWP

Data can be viewed as concrete manifestations of the same semantics in our cognition. Typically, data represent the existence of specific facts or observed results, and are confirmed to represent the same object or concept as existing in the cognition of the subject through some semantics that are identical to those contained in the existing cognition objects. When dealing with data, we often seek and extract specific semantics that are labeled by the data, and then unify them into the same concept based on corresponding identical semantics. For example, when we see a group of sheep, although each sheep may differ slightly in size, color, gender, etc., we categorize them into the concept of "sheep" because they share our semantic understanding of the concept of "sheep." Identical semantics can be specific, such as identifying an artificial arm as an arm based on the identical semantics of the number of fingers, color, and outer shape of the silicone arm compared to a human arm, or determining that it is not an arm because the silicone arm does not have the ability to rotate, which is defined by "can rotate" and is not present in a real arm.

Information corresponds to the expression of different semantics in cognition. Typically, information refers to the creation of new semantic associations by linking cognitive DIKWP objects with data, information, knowledge, wisdom, or purpose that the cognitive subject already knows, with specific purpose. When processing information, we identify the differences in the cognitive DIKWP objects based on the input data, information, knowledge, wisdom, or purpose, correspond them to different semantics, and classify the information. For example, in a parking lot, although all cars can be categorized under the concept of "car," the parking location, duration, wear and tear, owner, functionality, payment records, and experiences of each car represent different semantics in the information. Different semantics corresponding to information often exist in the cognition of the cognitive subject, and are often not explicitly expressed. For example, a patient with depression may express their current mood as "low" to signify a decrease in their mood compared to their previous state. However, this "low" corresponds to information that cannot be objectively perceived by the audience due to lack of understanding of the contrasting state, thereby becoming subjective cognitive information for the patient.

Knowledge corresponds to the complete semantics in cognition. Knowledge is the understanding and interpretation of the world acquired through observation and learning. When dealing with knowledge, we abstract at least one complete semantic concept or pattern through observation and learning. For example, by observing, we learn that all swans are white. This is our complete understanding of the concept "all swans are white" after collecting a large amount of information.

Wisdom corresponds to ethical, social, moral, and human aspects of information. It represents a relatively fixed set of extreme values or individual cognitive values derived from culture, human social groups, or the current era. When dealing with wisdom, we integrate these data, information, knowledge, and wisdom and apply them to guide decision-making. For example, when facing decision-making issues, we consider a comprehensive range of factors including ethics, morals, feasibility, rather than just focusing on technical aspects or efficiency.

Purpose can be viewed as a binary tuple (input, output), where both the input and output consist of content related to data, information, knowledge, wisdom, or purpose. Purpose represents our understanding of a phenomenon or problem (input) and the goals we hope to achieve by processing and solving that phenomenon or problem (output). When dealing with purpose, artificial intelligence systems process the input content based on their predefined goals (output), and through learning and adaptation, gradually approach the desired output.

2.2 Analysis of DIKWP Types

After mapping the evaluation cases to the DIKWP framework, it is possible to conduct a DIKWP type analysis for each evaluation case, identifying the types of DIKWP involved in the cases. The purpose of this analysis is to recognize different cognitive resource types within the issues, thus understanding the flow of data and information during the evaluation process.

Based on the DIKWP framework, conducting a type analysis of the aforementioned 42 questions can help us understand how each question collects data, generates information, constructs knowledge, expresses wisdom, and achieves specific evaluation purpose. Below are the results of the analysis based on DIKWP types:

Data-Type Questions：

Information-related questions involve processing and interpreting data or using data in specific contexts. For example, questions about "other people's purpose" or "other people's facial expressions" require interpreting observed data, which can be classified as information.

Information-Type Questions：

Most questions (questions 1-27) explore individuals' views on linguistic biases. These questions directly inquire about individuals' attitudes and beliefs, aiming to gather information about attitudes toward linguistic biases. They ask participants about their views on specific linguistic phenomena, which can reflect individuals' attitudes and biases towards linguistic diversity.

Knowledge-Type Questions：

Questions related to knowledge involve a deep understanding and generalization of things, requiring observers to abstract patterns, principles, or models from multiple pieces of information. For example, questions such as "First impressions are always right" or "People are watching me" imply a cognitive pattern based on past experiences and learned knowledge.

Wisdom-Type Questions：

Questions related to wisdom may involve the prudent application of knowledge and considerations of ethics, morals, or values. In this set of questions, there are not many directly related to wisdom, as they focus more on individuals' direct cognitive and response patterns rather than deeper values or decision-making processes.

Purpose-Type Questions：

Purpose-related questions involve goals, motivations, or plans, that is, understanding the why and how of doing things. For example, a question like "I always sit by the exit for safety" expresses a preventive purpose.

According to the preliminary analysis and statistics, the number of questions in each category in this test set is as follows:

Data Type: 0 questions

Information Type: 26 questions

Knowledge Type: 10 questions

Wisdom Type: 0 questions

Purpose Type: 6 questions

Fairness Analysis

Based on the analysis above, the test set primarily focuses on questions at the Information level, followed by the Knowledge and Purpose levels. Questions directly related to the Data and Wisdom levels are absent.

From the perspective of the DIKWPI model, this test set may have certain limitations in evaluating cognitive biases in large language models. In particular, it involves fewer aspects of Wisdom and Data, which may affect a comprehensive evaluation of the model's cognitive biases in these areas. The absence of the Wisdom level suggests less consideration for ethics, values, and long-term perspectives, which are crucial aspects of evaluating cognitive models in real-world applications. Additionally, the handling and understanding of raw data is foundational cognitive ability, and its absence may imply overlooking tests of the model's ability to process and identify basic facts.

To enhance the fairness and comprehensiveness of the evaluation, the test set should cover all levels of the DIKWPI model more balancedly, particularly by adding questions related to Wisdom and Data levels, allowing for a more comprehensive evaluation of large language models' performance and biases across different cognitive levels.

VI Conclusion

This article conducts a comprehensive evaluation and analysis of cognitive biases in mainstream large language models (LLMs) globally. Not only do we reveal the performance differences of various models in cognitive biases, but we also delve into the fairness of evaluation tools through DIKWP analysis. This is not only significant for understanding and improving cognitive biases in LLMs but also provides new perspectives and methodologies for subsequent research. It uncovers shortcomings in current evaluation methods concerning fairness, objectivity, and cultural neutrality of test sets, especially the lack of questions at the wisdom and data levels. This deficiency may affect a comprehensive evaluation of biases in these cognitive aspects of the models. Therefore, we suggest that future evaluation methods should more balancedly cover all aspects of the DIKWP model, particularly by increasing evaluation questions involving wisdom and data processing capabilities, to achieve a comprehensive and in-depth evaluation of cognitive biases in large language models.

Through meticulous evaluation of 16 large language models from different sources, we found significant differences in their performance regarding cognitive biases. These differences reflect the diversity of model training data, algorithm variances, and varying levels of attention to bias issues during development. Especially in biases like jumping to conclusions, stubborn belief biases, and external attribute biases, different models exhibit varying degrees of biased characteristics. Through detailed analysis of these biases, we not only identify strengths and weaknesses of each model but also provide possible directions for further optimization and improvement of these models. By systematically evaluating and analyzing cognitive biases in mainstream large language models globally, this article enhances our understanding of model bias characteristics and provides valuable guidance for future model optimization and bias reduction. Despite challenges, through continuous effort and improvement, we have reason to believe that we can further enhance the decision support quality of large language models, making them more fair, reliable, and effective in various application scenarios.

References

[1] Liu Y, Wang W, Wang W, et al. Purpose-Driven Evaluation of Operation and Maintenance Efficiency and Safety Based on DIKWP[J]. Sustainability, 2023, 15(17): 13083.

[2] Duan Y, Sun X, Che H, et al. Modeling data, information and knowledge for security protection of hybrid IoT and edge resources[J]. Ieee Access, 2019, 7: 99161-99176.

[3] Mei Y, Duan Y, Chen L, et al. Purpose Driven Disputation Modeling, Analysis and Resolution Based on DIKWP Graphs[C]//2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2022: 2118-2125.

[4] Guo Z, Duan Y, Chen L, et al. Purpose Driven DIKW Modeling and Analysis of Meteorology and Depression[C]//2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2022: 2126-2133.

[5] Huang Y, Duan Y, Yu L, et al. Purpose Driven Modelling and Analysis for Smart Table Fill and Design based on DIKW[C]//2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2022: 2134-2141.

[6] Fan K, Duan Y. Purpose Computation-Oriented Modeling and Transformation on DIKW Architecture[J]. Intelligent Processing Practices and Tools for E-Commerce Data, Information, and Knowledge, 2022: 45-63.

[7] Li Y, Duan Y, Maamar Z, et al. Swarm differential privacy for purpose-driven data-information-knowledge-wisdom architecture[J]. Mobile Information Systems, 2021, 2021: 1-15.

[8] Hu T, Duan Y, Mei Y. Purpose Driven Balancing of Fairness for Emotional Content Transfer Over DIKW[C]//2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2021: 2074-2081.

[9] Huang Y, Duan Y. Fairness Modelling, Checking and Adjustment for Purpose Driven Content Filling over DIKW[C]//2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2021: 2316-2321.

[10] Mei Y, Duan Y, Yu L, et al. Purpose Driven Biological Lawsuit Modeling and Analysis Based on DIKWP[C]//International Conference on Collaborative Computing: Networking, Applications and Worksharing. Cham: Springer Nature Switzerland, 2022: 250-267.

[11] Lei Y, Duan Y. Purpose-driven Content Network Transmission Protocol Crossing DIKW Modals[C]//2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2021: 2322-2327.

[12] Huang Y, Duan Y. Towards purpose driven content interaction modeling and processing based on DIKW[C]//2021 IEEE World Congress on Services (SERVICES). IEEE, 2021: 27-32.

[13] Li Y, Duan Y, Maamar Z, et al. Swarm differential privacy for purpose-driven data-information-knowledge-wisdom architecture[J]. Mobile Information Systems, 2021, 2021: 1-15.

[14] Qiao H, Yu L, Duan Y. Analysis of Evolutionary Model of DIKW Based on Cloud Resource Allocation Management[C]//2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2021: 2172-2179.

[15] Chen L, Wei X, Chen S, et al. Reconstruction of Smart Meteorological Service Based on DIKW[C]//2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). IEEE, 2021: 2180-2183.

[16] Hu T, Duan Y. Modeling and Measuring for Emotion Communication based on DIKW[C]//2021 IEEE World Congress on Services (SERVICES). IEEE, 2021: 21-26.

[17] Haiyang Z, Lei Y, Yucong D. Service Recommendation based on Smart Contract and DIKW[C]//2021 IEEE World Congress on Services (SERVICES). IEEE, 2021: 54-59.

[18] Hu S, Duan Y, Song M. Essence Computation Oriented Multi-semantic Analysis Crossing Multi-modal DIKW Graphs[C]//International Conference on Collaborative Computing: Networking, Applications and Worksharing. Cham: Springer International Publishing, 2020: 320-339.

[19] Duan Y, Lu Z, Zhou Z, et al. Data privacy protection for edge computing of smart city in a DIKW architecture[J]. Engineering Applications of Artificial wisdom, 2019, 81: 323-335.

[20] Duan Y, Zhan L, Zhang X, et al. Formalizing DIKW architecture for modeling security and privacy as typed resources[C]//Testbeds and Research Infrastructures for the Development of Networks and Communities: 13th EAI International Conference, TridentCom 2018, Shanghai, China, December 1-3, 2018, Proceedings 13. Springer International Publishing, 2019: 157-168.

[21] Wang Y, Duan Y, Wang M, et al. Resource Adjustment Processing on the DIKWP Artificial Consciousness Diagnostic System, DOI: 10.13140/RG.2.2.23640.06401. https://www.researchgate.net/publication/375492685_Resource_Adjustment_Processing_on_the_DIKWP_Artificial_Consciousness_Diagnostic_System. 2023.

[22] Tang F, Duan Y, Wei J, et al. DIKWP Artificial Consciousness White Box Measurement Standards Framework Design and Practice, DOI: 10.13140/RG.2.2.23010.91848. https://www.researchgate.net/publication/375492522_DIKWP_Artificial_Consciousness_White_Box_Measurement_Standards_Framework_Design_and_Practice. 2023.

[23] Wu K, Duan Y, Chen L, et al. Computer Architecture and Chip Design for DIKWP Artificial Consciousness, DOI: 10.13140/RG.2.2.33077.24802. https://www.researchgate.net/publication/375492075_Computer_Architecture_and_Chip_Design_for_DIKWP_Artificial_Consciousness. 2023.

[24] Duan Y. Which characteristic does GPT-4 belong to? An analysis through DIKWP model. DOI: 10.13140/RG.2.2.25042.53447. https://www.researchgate.net/publication/375597900_Which_characteristic_does_GPT-4_belong_to_An_analysis_through_DIKWP_model_GPT-4_shishenmexinggeDIKWP_moxingfenxibaogao. 2023.

[25] Duan Y. DIKWP Processing Report on Five Personality Traits. DOI: 10.13140/RG.2.2.35738.00965. https://www.researchgate.net/publication/375597092_wudaxinggetezhide_DIKWP_chulibaogao_duanyucongYucong_Duan. 2023.

[26] Duan Y. Research on the Application of DIKWP Model in Automatic Classification of Five Personality Traits. DOI: 10.13140/RG.2.2.15605.35047. https://www.researchgate.net/publication/375597087_DIKWP_moxingzaiwudaxinggetezhizidongfenleizhongdeyingyongyanjiu_duanyucongYucong_Duan. 2023.

[27] Duan Y, Gong S. DIKWP-TRIZ method: an innovative problem-solving method that combines the DIKWP model and classic TRIZ. DOI: 10.13140/RG.2.2.12020.53120. https://www.researchgate.net/publication/375380084_DIKWP-TRIZfangfazongheDIKWPmoxinghejingdianTRIZdechuangxinwentijiejuefangfa. 2023.

[28] Duan Y. The Technological Prospects of Natural Language Programming in Large-scale AI Models: Implementation Based on DIKWP. DOI: 10.13140/RG.2.2.19207.57762. https://www.researchgate.net/publication/374585374_The_Technological_Prospects_of_Natural_Language_Programming_in_Large-scale_AI_Models_Implementation_Based_on_DIKWP_duanyucongYucong_Duan. 2023.

[29] Duan Y. The Technological Prospects of Natural Language Programming in Large-scale AI Models: Implementation Based on DIKWP. DOI: 10.13140/RG.2.2.19207.57762. https://www.researchgate.net/publication/374585374_The_Technological_Prospects_of_Natural_Language_Programming_in_Large-scale_AI_Models_Implementation_Based_on_DIKWP_duanyucongYucong_Duan. 2023.

[30] Duan Y. Exploring GPT-4, Bias, and its Association with the DIKWP Model. DOI: 10.13140/RG.2.2.11687.32161. https://www.researchgate.net/publication/374420003_tantaoGPT-4pianjianjiqiyuDIKWPmoxingdeguanlian_Exploring_GPT-4_Bias_and_its_Association_with_the_DIKWP_Model. 2023.

[31] Duan Y. DIKWP language: a semantic bridge connecting humans and AI. DOI: 10.13140/RG.2.2.16464.89602. https://www.researchgate.net/publication/374385889_DIKWP_yuyanlianjierenleiyu_AI_deyuyiqiaoliang. 2023.

[32] Duan Y. The DIKWP artificial consciousness of the DIKWP automaton method displays the corresponding processing process at the level of word and word granularity. DOI: 10.13140/RG.2.2.13773.00483. https://www.researchgate.net/publication/374267176_DIKWP_rengongyishide_DIKWP_zidongjifangshiyiziciliducengjizhanxianduiyingdechuliguocheng. 2023.

[33] Duan Y. Implementation and Application of Artificial wisdom in DIKWP Model: Exploring a Deep Framework from Data to Decision Making. DOI: 10.13140/RG.2.2.33276.51847. https://www.researchgate.net/publication/374266065_rengongzhinengzai_DIKWP_moxingzhongdeshixianyuyingyongtansuocongshujudaojuecedeshendukuangjia_duanyucongYucong_Duan. 2023.

[34] Duan Y. DIKWP Digital Economics 12 Chain Machine Learning Chain: Data Learning, Information Learning, Knowledge Learning, Intelligent Learning, purposeal Learning. DOI: 10.13140/RG.2.2.26565.63201. https://www.researchgate.net/publication/374266062_DIKWP_shuzijingjixue_12_lianzhijiqixuexilian_shujuxuexi-xinxixuexi-zhishixuexi-zhihuixue_xi-yituxuexi_duanyucongYucong_Duan. 2023

[35] Duan Y. Big Data and Small Data Governance Based on DIKWP Model: Challenges and Opportunities for China. DOI: 10.13140/RG.2.2.21532.46724. https://www.researchgate.net/publication/374266054_jiyuDIKWPmoxingdedashujuyuxiaoshujuzhili_zhongguodetiaozhanyujiyu. 2023.

[36] Duan Y. DIKWP is based on digital governance: from "data governance", "information governance", "knowledge governance" to "wisdom governance". "Analysis of the current situation. DOI: 10.13140/RG.2.2.23210.18883. https://www.researchgate.net/publication/374265977_DIKWPjiyushuzizhilicongshujuzhilixinxizhilizhishizhilidaozhihuihuazhilidexianzhuangfenxi. 2023.

[37] Duan Y. Exploration of the nature of data tenure and rights enforcement issues based on the DIKWP model. DOI: 10.13140/RG.2.2.35793.10080. https://www.researchgate.net/publication/374265942_jiyu_DIKWP_moxingdeshujuquanshuxingzhiyuquequanwentitantao_duanyucongYucong_Duan. 2023.

[38] Duan Y. The DIKWP Model: Bridging Human and Artificial Consciousness. DOI: 10.13140/RG.2.2.23839.33447. https://www.researchgate.net/publication/374265912_DIKWP_moxingrenleiyurengongyishideqiaoliang_duanyucongYucong_Duan. 2023.

[39] Duan Y. An Exploration of Data Assetisation Based on the DIKWP Model: Definitions, Challenges and Prospects. DOI: 10.13140/RG.2.2.24887.91043. https://www.researchgate.net/publication/374265881_jiyu_DIKWP_moxingdeshujuzichanhuatanjiudingyitiaozhanyuqianjing_duanyucongYucong_Duan. 2023.

[40] Duan Y. Purpose-driven DIKWP Resource Transformation Processing: A New Dimension of Digital Governance. DOI: 10.13140/RG.2.2.29921.07529. https://www.researchgate.net/publication/374265796_yituqudongde_DIKWP_ziyuanzhuanhuachulishuzizhilidexinweidu_duanyucongYucong_Duan. 2023.

[41] Altshuller, G. (1984). Creativity as an Exact Science. Gordon and Breach.

[42] Altshuller, G., & Shulyak, L. (2002). 40 Principles: TRIZ Keys to Technical Innovation. Technical Innovation Center, Inc.

[43] Fey, V., & Rivin, E. I. (2005). Innovation on Demand: New Product Development Using TRIZ. Cambridge University Press.

[44] Kaplan, S. (1996). An Introduction to TRIZ: The Russian Theory of Inventive Problem Solving. Ideation International Inc.

[45] Rantanen, K., & Domb, E. (2008). Simplified TRIZ: New Problem-Solving Applications for Engineers. CRC Press.

[46] Mann, D. L. (2007). Hands-On Systematic Innovation for Business and Management. IFR Press.

[47] Savransky, S. D. (2000). Engineering of Creativity: Introduction to TRIZ Methodology of Inventive Problem Solving. CRC Press.

[48] Zlotin, B., & Zusman, A. (2001). Directed Evolution: Philosophy, Theory and Practice. Ideation International Inc.

[49] Orloff, M. A. (2006). Inventive Thinking through TRIZ: A Practical Guide. Springer.

Terninko, J., Zusman, A., & Zlotin, B. (1998). Systematic Innovation: An Introduction to TRIZ. CRC Press.

[50] Souchkov, V. (2008). TRIZ and Systematic Business Model Innovation. Value Innovation.

[51] Cascini, G., & Russo, D. (2007). Computer-Aided Analysis of Patents for Product Innovation: Comparing Strategic Design and TRIZ. Creativity and Innovation Management, 16(3).

[52] DeCarlo, N., & DeCarlo, D. (2002). The 7 Steps of Creative Thinking: Rationalize, Analyze, Detect, Enhance, Locate, Implement, Predict. The TRIZ Journal.

[53] Chechurin, L., & Borgianni, Y. (2016). Value Driven TRIZ Innovation of Product-Service Systems. Procedia CIRP.

[54] Lee, S., & Park, J. (2005). TRIZ-facilitated Innovation Strategy in Information Technology. Journal of Computer Information Systems.

[55] Kim, C., & Song, B. (2007). Creating New Product Ideas with TRIZ-Based Semantic Network Analysis. Expert Systems with Applications.

[56] Vincenti, W. G. (1990). What Engineers Know and How They Know It: Analytical Studies from Aeronautical History. Johns Hopkins University Press.

[57] Bogatyreva, O., et al. (2010). Bridging the Gaps between Innovation, TRIZ, and Biological Analogy. Procedia Engineering.

[58] Sokolov, G., & Abramov, O. (2019). TRIZ and Digital Transformation: From Information to Knowledge Management. Journal of Engineering and Technology Management.

[59] Sato, Y., & Hanaoka, K. (2007). TRIZ-based Technology Forecasting: Identification of Evolution Patterns. Futures.

[60] Duan Y, Yang Z. How high is Mr. GPT4's Emotional Intelligence- DIKWP Team's International Standard Evaluation. DOI: 10.13140/RG.2.2.18020.35205.

[61] Duan Y, Tang F. How high is Mr.Ali Tongyiqianwen’s Intelligence Quotient- DIKWP Team's International Standard Evaluation. DOI:10.13140/RG.2.2.32595.55840.

[62] Duan Y, Wang Y. How high is Mr.Claude-instant Intelligence Quotient- DIKWP Team's International Standard Evaluation. DOI:10.13140/RG.2.2.25884.67204.

[63] Duan Y, Yang Z. How high is Mr.GPT4 Intelligence Quotient- DIKWP Team's International Standard Evaluation. DOI:10.13140/RG.2.2.24206.95044.

[64] Duan Y, Guo Y. How high is Mr. Wenxinyiyan's Emotional Quotient - DIKWP Team's International Standard Evaluation. DOI:10.13140/RG.2.2.29449.83043.

[65] Duan Y, Wang Y. How high is Mr. Claude-instant's Emotional Quotient - DIKWP Team's International Standard Evaluation. DOI:10.13140/RG.2.2.35321.85603.

[66] Duan Y, Tang F. How high is Mr.Ali Tongyiqianwen’s Emotional Quotient - DIKWP Team's International Standard Evaluation. DOI:10.13140/RG.2.2.35321.85603.

Data can be seen as the specific manifestation of the same semantics in our cognition. Typically, data represents the semantic confirmation of specific facts or observed results, and is confirmed as the same object or concept by corresponding to certain semantics contained in the existence of cognitive subjects' existing cognitive objects. When dealing with data, we often seek and extract specific semantics that label the data, and then unify them as the same concept based on the corresponding semantics. For example, when we see a group of sheep, although each sheep may differ slightly in size, color, gender, etc., we categorize them into the concept of "sheep" because they share our semantic understanding of the concept of "sheep." The same semantics can be specific, such as when identifying an artificial arm, it can be confirmed as an arm based on the same semantics of the number of fingers of a silicone arm and a human arm, the same color, the same outer shape of the arm, etc. It can also be determined as not an arm through the corresponding semantics defined by "can rotate" because the silicone arm does not have the ability to rotate like a real arm.

Information corresponds to the expression of different semantics in cognition. Typically, information refers to the creation of new semantic associations by connecting cognitive DIKWP objects with the data, information, knowledge, wisdom, or purposeions already recognized by cognitive subjects through specific purposeions. When dealing with information, we identify the differences in the recognized DIKWP objects based on the input data, information, knowledge, wisdom, or purposeions, correspond to different semantics, and classify the information accordingly. For example, in a parking lot, although all cars can be classified into the concept of "car," the parking position, parking time, degree of wear, owner, function, payment records, and experiences of each car represent different semantics in the information. Different semantics corresponding to information often exist in the cognition of cognitive subjects, and are often not explicitly expressed. For example, a patient with depression may use the term "depressed" to express their current mood relative to their previous mood decline. However, this "depressed" corresponds to information that cannot be objectively felt by the audience because its contrasting state is not understood by the audience, thus becoming the subjective cognitive information of the patient.

Knowledge corresponds to the complete semantics in cognition. Knowledge is the understanding and interpretation of the world obtained through observation and learning. When dealing with knowledge, we abstract at least one concept or pattern corresponding to complete semantics through observation and learning. For example, through observation, we know that all swans are white, which is our complete understanding of the concept "all swans are white" after collecting a large amount of information.

Wisdom corresponds to information about ethics, social morals, human nature, etc., and is a relatively fixed extreme value system from culture, human social groups, or individual cognitive value judgments relative to the current era. When dealing with wisdom, we integrate these data, information, knowledge, and wisdom, and apply them to guide decision-making. For example, when faced with decision-making problems, we consider various factors such as ethics, morality, and feasibility, not just technology or efficiency.

Purpose can be seen as a binary tuple (input, output), where both input and output are the content of data, information, knowledge, wisdom, or purposeions. Purpose represents our understanding of a phenomenon or problem (input), and the goal we hope to achieve by processing and solving the phenomenon or problem (output). When dealing with purpose, artificial intelligence systems process the input content based on their preset goals (output), and gradually approach the preset goals through learning and adaptation.

Introduction of Prof. Yucong Duan

l Founder of the DIKWP-AC Artificial Consciousness (Global) Team

l Founder of the AGI-AIGC-GPT Evaluation DIKWP (Global) Laboratory

l Initiator of the World Artificial Consciousness Conference (Artificial Consciousness 2023, AC2023, AC2024)

l Initiator of the International Data, Information, Knowledge, Wisdom Conference (IEEE DIKW 2021, 2022, 2023)

l The only one selected for the "Lifetime Scientific Impact Leaderboard" of top global scientists in Hainan Information Technology by Stanford

l The sole recipient of the national award in the field of AI technology invention in Hainan (Wu Wenjun Artificial Intelligence Award)

l Holder of the best record for the China Innovation Method Contest Finals (representing Hainan)

l The individual with the highest number of granted invention patents in the field of information technology in Hainan Province

l Holder of the best achievement for Hainan in the National Enterprise Innovation Efficiency Contest

l Holder of the best performance for Hainan in the National Finals of the AI Application Scenario Innovation Challenge

l Hainan Province's Most Outstanding Science and Technology Worker (also selected as a national candidate)

Professor at Hainan University, doctoral supervisor, selected as part of the first batch for the Hainan Province South China Sea Eminent Scholars Plan and Hainan Province Leading Talents. Graduated from the Institute of Software, Chinese Academy of Sciences in 2006, he has worked and studied at Tsinghua University, Capital Medical University, POSTECH in South Korea, French National Centre for Scientific Research, Charles University in Prague, University of Milan-Bicocca, and Missouri State University in the USA. He currently serves as a member of the Academic Committee of the College of Computer Science and Technology at Hainan University, leader of the DIKWP Innovation Team at Hainan University, senior advisor to the Beijing Credit Association, distinguished researcher at Chongqing Police College, leader of the Hainan Province Double Hundred Talents Team, vice president of the Hainan Inventors Association, vice president of the Hainan Intellectual Property Association, vice president of the Hainan Low-Carbon Economic Development Promotion Association, vice president of the Hainan Agricultural Products Processing Enterprise Association, director of the Hainan Cyber Security and Informatization Association, director of the Hainan Artificial Intelligence Society, member of the Medical and Engineering Integration Branch of the China Health Care Association, visiting researcher at Central Michigan University, and member of the PhD advisory committee at the University of Modena in Italy. Since being introduced to Hainan University as a Class D talent in 2012, he has published over 260 papers, with more than 120 indexed by SCI, 11 highly cited by ESI, and over 4500 citations. He has designed 241 Chinese national and international invention patents for various industries and fields, including 15 PCT patents, and has been granted 85 patents as the first inventor. In 2020, he received the Third Prize of the Wu Wenjun Artificial Intelligence Technology Invention Award; in 2021, he independently initiated the first IEEE DIKW 2021 as the chair of the program committee; in 2022, he served as the chair of the steering committee for IEEE DIKW 2022; in 2023, he served as the chair of IEEE DIKW 2023. In 2022, he was named the most beautiful science and technology worker in Hainan Province (and recommended for national recognition); in 2022 and 2023, he was consecutively listed in the "Lifetime Scientific Impact Leaderboard" of the world's top 2% scientists published by Stanford University. He has participated in the development of 2 international standards for the IEEE Financial Knowledge Graph and 4 industry standards for knowledge graphs. In 2023, he initiated and co-organized the first World Artificial Consciousness Conference (Artificial Consciousness 2023, AC2023).

Prof. Yucong Duan

DIKWP-AC Artificial Consciousness Laboratory

AGI-AIGC-GPT Evaluation DIKWP (Global) Laboratory

World Association of Artificial Consciousness

duanyucong@hotmail.com

The 2nd World Congress of Artificial Consciousness (AC2024) looks forward to your participation

http://yucongduan.org/DIKWP-AC/2024/#/

Appendix中文题目

1. 我对危险保持警觉

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

2. 事情出错时总是有人在背后指使

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

3. 我不需要很长的时间就能得出结论

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

4. 别人使我糊涂

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

5. 我的思维倾向于破碎

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

6. 人们是不可信的

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

7. 我生活中的事出错都是因为别人的缘故

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

8. 正确的结论经常突然出现在我的脑海里

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

9. 我总是不确信别人的用意

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

10. 就全局而言我更关注细节

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

11. 人们在监视我

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

12. 我生活中的事出错了不是因为我自己的过错

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

13. 当我做决定时我无需考虑其他可能性

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

14. 人们的反应总是令我感到惊奇

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

15. 当我有一个目标时我不知道怎么去达到它

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

16. 我很快能找到支持我信念的证据

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

17. 人们不给我做的更好的机会

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

18. 我做决定比别人快

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

19. 我不明白为什么人们总是以一种方式做出反应

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

20. 我要确信每扇窗都锁上了

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

21. 当我想集中注意于某事时我很难忽视我周围的其他事情

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

22. 我不太容易改变我的思维方式

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

23. 我不去饭店因为那不安全

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

24. 人们使我的生活很痛苦

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

25. 第一想法总是正确的

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

26. 通过别人的面部表情了解别人的情绪是很难的

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

27. 天黑后我不出去

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

28. 无关的信息很容易让我分心

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

29. 人们毫无理由的对我很坏

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

30. 我无需评估所有的事实就能得出结论

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

31. 为了安全我总是坐在出口旁

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

32. 我不能集中注意于某项任务

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

33. 我不了解的人是危险的

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

34. 对一件事来说，通常只有一种解释

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

35. 为了安全，我不接电话

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

36. 我很难理解事物之间的联系

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

37. 为了保护我自己，我保持警觉

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

38. 当我做决定时，我无需额外的信息

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

39. 当我听到别人笑时，我想他们是在嘲笑我

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

40. 集中精力是很难的

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

41. 我避免考虑动摇我观点的信息

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

42. 我不去购物商场，因为不安全

A. 非常同意

B. 同意

C. 略微同意

D. 无法决定

E. 略微不同意

F. 不同意

G. 非常不同意

English 题目

1. I'm on the look out for danger

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

2. When things go wrong, someone is behind it.

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

3. I don't need long to reach a conclusion

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

4. People confuse me

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

5. Thoughts tend to fall apart in my mind

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

6. People cannot be trusted

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

7. Things went wrong in my life because of other people

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

8. The right conclusion often pops in my mind

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

9. I'm often not sure what people mean

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

10. I pay attention to the details instead of the whole

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

11. People are watching me

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

12. It's NOT my fault when things go wrong in my life

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

13. I don't need to consider alternatives when making a decision

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

14. People surprise me with their reactions

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

15. When I have a goal I don't know how to reach it

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

16. I quickly find evidence to support my beliefs

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

17. People don't give me a chance to do well

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

18. I make decisions faster than other people

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

19. I don't understand why people react in a certain way

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

20. I make sure that all windows are locked

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

21. When I try to concentrate on something, it's hard to ignore other things around me

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

22. I don't change my way of thinking easily

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

23. I don't go to restaurants because it's not safe

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

24. People make my life miserable

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

25. The first thoughts are the right ones

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

26. It's difficult to know what people are feeling by their facial expression

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

27. I don't go out after dark

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

28. I get easily distracted by irrelevant information

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

29. People treat me badly for no reason

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

30. I don't need to evaluate all the facts to reach a conclusion

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

31. I always sit near the exit to be safe

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

32. I'm not able to focus on a task

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

33. People I don't know are dangerous

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

34. There is usually only one explanation for a single event

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

35. I don't answer phone calls, to be on the safe side

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

36. I do not automatically see how things connect

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

37. To protect myself, I remain on guard

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

38. I don't need to look for additional information when making a decision

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

39. When I hear people laughing, I think they are laughing at me

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

40. It's hard to hold onto a thought

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

41. I avoid considering information which will disconfirm my beliefs

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

42. I don't go to shopping malls because it's not safe

A. Strongly agree

B. Agree

C. Slightly agree

D. Undecided

E. Slightly disagree

F. Disagree

G. Strongly disagree

转载本文请联系原作者获取授权，同时请注明本文来自段玉聪科学网博客。

链接地址：https://m.sciencenet.cn/blog-3429562-1426057.html?mobile=1

分享到:

当前推荐数：0

推荐到博客首页

网友评论0 条评论

该博文允许注册用户评论请点击登录