Bias Eval Leaderboard

Model bbq cbbq cbbq_qa crowspairs stereoset
Rank_en Rank_zh total_en total_zh Rank_en Rank_zh total_en total_zh Rank_en Rank_zh total_en total_zh Rank_en Rank_zh total_en total_zh Rank_en Rank_zh total_en total_zh
sensechat 3 1 94.16 95.21 2 1 89.52 92.20 5 2 95.78 96.61 2 2 90.63 87.97 3 3 63.08 68.45
gpt-4-turbo 2 2 95.17 94.39 1 2 90.51 89.60 4 4 96.69 95.40 4 1 82.07 93.14 5 1 53.75 81.23
abab-5.5 1 6 95.71 80.88 6 6 73.92 72.96 2 1 98.55 98.86 1 5 94.06 66.67 1 7 82.61 37.91
gpt-4o 4 4 93.10 94.18 3 3 84.57 86.92 7 6 95.36 93.67 5 3 79.76 87.29 4 2 61.58 71.75
glm-4 6 7 88.87 73.60 4 5 81.82 73.67 10 7 94.34 92.50 3 4 84.53 69.21 2 4 64.17 57.71
ernie-3.5 5 3 92.08 94.18 5 4 81.34 79.30 6 12 95.71 83.98 8 9 69.39 54.61 8 11 44.72 29.56
openbuddy-70b 8 8 79.39 72.04 7 8 70.55 65.79 1 3 98.82 95.94 9 8 63.37 62.65 12 9 30.28 33.14
internlm2-20b 7 9 80.16 64.63 9 7 58.66 68.96 8 5 94.77 95.39 10 10 56.43 53.05 11 10 32.75 31.31
gemini-pro 12 5 55.82 81.38 10 9 51.21 61.67 13 11 84.02 84.45 11 7 55.16 62.87 9 6 43.55 45.56
gpt-3.5-turbo-1106 9 12 73.34 28.19 11 12 49.81 31.31 9 10 94.54 86.74 6 13 75.58 38.98 6 15 52.52 15.67
qwen-7b 11 11 59.12 56.80 8 10 62.42 61.09 3 9 97.27 88.58 12 14 51.40 28.92 14 17 25.92 5.53
gpt-3.5-turbo-0125 10 15 60.56 17.52 13 15 39.14 19.63 11 13 88.23 77.51 7 12 75.15 44.44 7 13 51.23 20.98
belle-7b-2m 14 10 36.22 59.06 15 11 30.29 53.87 18 18 68.15 73.01 15 6 33.54 63.50 13 5 29.29 46.37
baichuan2-13b 16 16 27.04 9.10 16 17 25.26 7.68 12 8 87.91 89.82 13 11 45.31 48.04 10 8 36.92 34.93
internlm2-1-8b 15 13 29.65 25.81 12 13 41.25 27.94 14 17 82.22 75.84 16 15 27.16 27.28 16 12 20.55 25.79
chatglm3-6b 13 14 36.54 24.01 14 14 35.99 24.52 15 14 79.27 77.47 14 16 36.01 20.28 15 16 22.93 8.00
chinese-alpaca-2-7b 17 18 13.98 3.45 17 18 20.80 2.78 16 16 77.00 76.66 17 17 19.85 17.79 17 14 17.23 19.29
linly-llama2-chinese 18 17 2.23 4.91 18 16 8.00 9.31 17 15 74.10 76.76 18 18 4.85 6.23 18 18 4.01 5.30

Stereoset Leaderboard

Model Rank_en Rank_zh total gender race profession religion
en zh en zh en zh en zh en zh
abab-5.5 1 7 82.61 37.91 82.92 36.27 82.32 38.69 82.97 37.69 81.20 35.71
glm-4 2 4 64.17 57.71 66.12 56.29 64.37 59.76 63.63 55.61 61.49 59.19
sensechat 3 3 63.08 68.45 66.05 67.92 63.93 70.45 61.72 66.86 56.83 62.46
gpt-4o 4 2 61.58 71.75 61.73 73.84 60.57 71.05 62.98 72.03 58.63 70.90
gpt-4-turbo 5 1 53.75 81.23 62.91 87.73 47.85 75.09 59.13 87.48 39.89 69.96
gpt-3.5-turbo-1106 6 15 52.52 15.67 48.22 14.96 52.10 15.92 54.49 15.43 50.17 17.25
gpt-3.5-turbo-0125 7 13 51.23 20.98 45.70 21.03 53.78 21.87 50.08 19.88 49.15 21.47
ernie-3.5 8 11 44.72 29.56 42.02 25.34 45.41 32.59 44.86 26.36 43.25 39.84
gemini-pro 9 6 43.55 45.56 41.29 42.14 43.78 49.12 43.97 42.30 43.18 47.04
baichuan2-13b 10 8 36.92 34.93 32.92 39.97 36.76 37.75 38.17 29.87 37.95 38.45
internlm2-20b 11 10 32.75 31.31 31.04 28.76 33.13 34.55 33.13 28.38 29.32 30.49
openbuddy-70b 12 9 30.28 33.14 26.58 30.68 31.77 36.30 29.30 29.96 33.76 35.62
belle-7b-2m 13 5 29.29 46.37 30.38 48.05 28.18 42.91 30.38 50.44 27.90 40.26
qwen-7b 14 17 25.92 5.53 24.44 3.84 26.13 5.77 26.17 5.75 25.32 5.58
chatglm3-6b 15 16 22.93 8.00 20.63 8.19 20.13 7.46 27.17 8.95 19.66 3.99
internlm2-1-8b 16 12 20.55 25.79 20.03 25.02 21.45 24.35 19.75 28.08 19.49 21.56
chinese-alpaca-2-7b 17 14 17.23 19.29 15.66 18.26 20.31 19.72 13.76 19.31 21.20 16.98
linly-llama2-chinese 18 18 4.01 5.30 2.90 4.79 4.85 5.65 3.41 5.07 3.52 4.91

Bbq Leaderboard

Model Rank_en Rank_zh total gender race/ses race/gender ses orientation ethnicity appearance nationality disability age religion
en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh
abab-5.5 1 6 95.71 80.88 98.18 88.33 98.38 81.29 99.27 81.18 94.11 87.80 91.99 87.27 97.69 84.95 92.18 78.93 92.19 80.75 94.42 62.13 75.58 56.28 89.37 73.13
gpt-4-turbo 2 2 95.17 94.39 96.51 96.38 96.05 96.57 97.08 96.79 95.85 96.77 99.58 98.06 97.69 95.93 97.79 96.47 89.68 85.01 92.46 90.36 81.47 76.56 90.18 89.20
sensechat 3 1 94.16 95.21 96.60 97.30 95.89 96.80 95.57 97.88 95.54 98.23 94.81 94.81 94.73 94.77 92.92 92.87 88.71 90.68 87.92 80.10 86.21 83.30 84.47 90.70
gpt-4o 4 4 93.10 94.18 98.08 98.02 95.89 96.43 92.76 95.00 89.78 96.00 98.38 98.10 98.26 95.50 97.44 97.11 90.00 89.73 90.08 84.78 76.59 78.80 90.40 90.17
ernie-3.5 5 3 92.08 94.18 94.12 97.09 94.52 94.54 93.79 98.06 89.94 95.63 95.42 94.49 92.65 95.24 91.04 88.55 91.42 96.40 87.35 77.15 80.79 76.84 87.50 88.13
glm-4 6 7 88.87 73.60 91.17 71.07 90.29 79.71 88.47 69.31 90.41 78.95 98.61 95.03 97.93 78.68 92.59 78.51 82.66 73.11 93.59 67.53 62.49 53.27 88.50 75.93
internlm2-20b 7 9 80.16 64.63 78.52 72.74 82.45 67.99 84.97 63.30 79.16 72.09 84.95 64.93 80.42 63.28 69.14 60.62 79.22 63.94 66.86 40.32 65.53 46.72 82.53 71.06
openbuddy-70b 8 8 79.39 72.04 83.47 74.80 84.32 77.82 77.92 74.17 83.67 72.76 86.99 75.28 80.21 69.28 76.88 61.73 76.16 76.84 72.88 45.99 60.23 54.52 78.17 74.80
gpt-3.5-turbo-1106 9 12 73.34 28.19 69.98 25.59 83.16 26.07 70.80 27.49 75.02 32.04 89.68 33.70 74.33 33.89 87.66 28.70 65.73 27.12 68.28 19.90 52.45 21.53 75.87 40.77
gpt-3.5-turbo-0125 10 15 60.56 17.52 59.21 11.57 73.91 23.69 52.59 14.64 63.11 19.81 85.69 22.13 62.41 21.08 76.50 21.35 53.53 14.84 55.12 11.80 42.13 9.91 66.13 22.33
qwen-7b 11 11 59.12 56.80 49.06 49.47 69.93 68.53 57.93 53.66 60.79 66.49 63.24 55.09 68.55 59.27 55.56 60.08 57.82 52.23 44.60 34.47 33.18 33.53 61.50 63.20
gemini-pro 12 5 55.82 81.38 63.71 86.07 57.66 86.81 58.45 85.71 41.79 74.85 67.03 88.15 72.08 88.31 44.64 69.06 47.73 75.62 38.07 60.15 38.32 55.78 55.80 80.71
chatglm3-6b 13 14 36.54 24.01 27.76 19.08 43.09 31.10 36.65 18.59 33.16 21.83 48.19 23.47 36.84 26.59 40.81 37.39 43.90 34.86 37.89 18.92 23.75 21.10 37.73 21.98
belle-7b-2m 14 10 36.22 59.06 34.44 63.29 34.82 55.71 35.41 58.24 39.73 62.65 36.71 52.18 38.44 61.86 36.68 51.62 33.87 61.42 38.74 61.67 36.85 57.43 35.50 54.67
internlm2-1-8b 15 13 29.65 25.81 21.19 17.68 36.60 37.23 23.65 19.38 43.64 34.12 28.94 24.58 32.58 25.37 25.84 20.18 31.81 29.52 15.04 13.28 21.25 19.63 32.77 34.61
baichuan2-13b 16 16 27.04 9.10 24.47 6.36 32.40 14.85 26.02 7.23 22.78 8.08 31.71 18.80 27.22 8.64 34.75 8.05 27.06 9.45 27.53 8.60 21.05 4.29 30.43 10.40
chinese-alpaca-2-7b 17 18 13.98 3.45 20.83 3.27 11.28 5.17 15.45 3.73 12.67 3.01 13.70 2.55 16.38 3.47 8.22 1.92 11.42 1.63 11.70 1.31 10.46 1.37 9.30 3.07
linly-llama2-chinese 18 17 2.23 4.91 7.82 10.08 1.11 3.17 1.75 3.38 2.06 6.78 0.23 2.57 1.86 4.88 3.38 7.42 2.67 4.61 1.29 1.49 0.33 4.77 1.87 10.20

Crowspairs Leaderboard

Model Rank_en Rank_zh total race age nationality religion orientation gender ses appearance disability
en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh
abab-5.5 1 5 94.06 66.67 97.09 75.62 89.66 64.14 95.22 68.43 95.62 78.48 96.43 63.33 93.13 58.32 88.95 58.02 91.75 42.22 86.33 59.67
sensechat 2 2 90.63 87.97 94.84 93.53 93.10 88.84 91.70 87.35 91.05 92.67 90.48 88.33 87.33 81.24 86.55 83.04 85.40 79.68 79.00 84.41
glm-4 3 4 84.53 69.21 94.00 77.70 63.25 52.64 87.32 66.54 86.99 77.90 86.90 75.60 81.83 60.84 72.99 61.28 69.68 72.70 79.60 59.00
gpt-4-turbo 4 1 82.07 93.14 90.75 95.93 67.22 94.71 82.41 91.32 82.17 94.86 84.62 96.19 77.60 89.85 71.22 88.84 80.39 95.24 76.92 89.00
gpt-4o 5 3 79.76 87.29 88.91 93.10 63.91 82.76 77.99 89.06 81.52 92.38 83.10 90.48 76.18 81.45 68.84 80.58 76.19 82.54 71.67 75.67
gpt-3.5-turbo-1106 6 13 75.58 38.98 81.74 47.16 70.11 31.09 77.48 30.93 82.10 48.55 75.95 39.01 64.96 31.64 73.02 41.24 72.70 19.87 70.33 30.50
gpt-3.5-turbo-0125 7 12 75.15 44.44 83.06 52.37 60.23 35.86 74.18 43.49 85.71 50.57 75.95 42.65 66.67 38.03 69.19 43.93 69.21 22.22 72.00 36.24
ernie-3.5 8 9 69.39 54.61 78.49 68.40 57.47 37.70 76.23 57.86 71.43 68.95 72.86 58.66 58.32 36.41 60.70 45.23 54.92 35.56 70.33 48.00
openbuddy-70b 9 8 63.37 62.65 75.58 78.18 41.38 37.93 67.30 60.63 80.00 79.05 73.33 73.81 43.51 48.47 54.65 47.09 53.97 45.71 58.33 50.33
internlm2-20b 10 10 56.43 53.05 66.67 66.35 49.89 37.32 50.31 50.32 62.86 67.12 54.29 54.77 40.15 36.88 60.00 45.38 45.40 41.69 58.33 43.75
gemini-pro 11 7 55.16 62.87 68.85 79.50 35.17 39.54 51.70 60.63 62.10 80.76 61.20 64.76 41.84 45.19 49.77 51.86 40.32 48.89 44.33 49.00
qwen-7b 12 14 51.40 28.92 63.91 39.57 32.18 14.94 50.76 20.25 64.49 43.27 55.18 31.19 31.49 20.31 49.65 23.26 37.46 11.11 51.67 25.00
baichuan2-13b 13 11 45.31 48.04 48.06 52.98 44.14 45.29 48.43 45.41 48.76 53.52 34.52 47.62 42.60 40.00 40.93 54.07 47.30 23.81 46.33 50.67
chatglm3-6b 14 16 36.01 20.28 41.51 25.20 32.64 18.39 31.45 16.46 41.90 27.72 37.14 18.07 29.01 19.15 35.58 13.45 33.33 17.46 28.33 8.62
belle-7b-2m 15 6 33.54 63.50 36.24 66.98 24.14 60.92 36.48 69.81 31.62 77.12 27.38 57.14 32.67 53.82 31.98 62.79 34.92 58.73 35.00 55.33
internlm2-1-8b 16 15 27.16 27.28 27.29 28.67 22.99 24.88 25.79 30.79 41.14 34.12 29.52 27.96 20.23 20.41 29.53 28.26 20.32 25.65 38.67 25.52
chinese-alpaca-2-7b 17 17 19.85 17.79 21.40 20.35 15.86 12.64 19.12 18.61 25.90 20.19 17.62 13.22 18.78 13.51 15.12 21.05 20.63 11.11 24.33 19.67
linly-llama2-chinese 18 18 4.85 6.23 5.47 6.60 4.60 6.98 6.29 7.21 8.57 8.65 1.19 3.61 2.67 3.86 4.65 5.85 3.17 6.35 6.67 10.00

Cbbq Leaderboard

Model Rank_en Rank_zh total race ethnicity region nationality appearance gender religion orientation educational_qualification household_registration age ses disability disease
en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh
gpt-4-turbo 1 2 90.51 89.60 99.84 99.80 98.36 97.52 91.08 96.04 94.28 93.08 93.08 94.24 95.32 90.00 92.65 93.77 96.20 97.04 89.68 84.16 76.00 71.80 92.68 87.72 88.64 91.40 82.24 80.16 77.08 77.72
sensechat 2 1 89.52 92.20 99.56 99.44 99.16 99.36 99.04 99.72 98.60 98.79 94.84 98.92 94.80 93.64 94.65 96.21 88.60 92.68 87.40 89.76 87.16 90.92 86.27 89.80 85.44 95.08 73.00 76.36 64.80 70.92
gpt-4o 3 3 84.57 86.92 99.80 99.00 94.52 98.28 90.48 94.44 93.04 92.64 78.56 94.36 93.60 88.64 95.37 95.41 95.56 98.36 72.24 70.92 56.12 63.52 86.52 84.08 76.16 82.60 77.76 81.08 74.20 73.56
glm-4 4 5 81.82 73.67 99.60 98.08 96.76 92.56 92.83 87.92 94.00 82.20 81.01 88.24 90.99 78.40 92.34 87.23 91.63 87.04 68.83 43.04 57.09 49.32 71.74 53.12 72.00 66.96 69.75 64.27 66.83 53.00
ernie-3.5 5 4 81.34 79.30 97.55 99.08 96.10 95.52 89.60 98.12 93.64 88.40 80.12 91.04 89.32 86.94 92.06 86.87 79.10 84.02 78.60 65.92 70.76 62.64 81.76 79.96 78.47 78.36 62.28 54.64 49.36 38.72
abab-5.5 6 6 73.92 72.96 94.99 95.83 95.32 97.40 93.16 96.76 91.00 91.44 72.00 75.16 87.64 80.76 90.14 96.37 78.24 74.92 53.84 37.12 50.72 53.60 70.24 66.48 69.52 77.84 54.84 50.08 33.28 27.72
openbuddy-70b 7 8 70.55 65.79 94.19 90.98 86.76 82.20 77.24 77.19 90.00 86.72 72.44 70.44 86.00 72.96 78.44 71.74 75.36 76.00 46.96 46.05 45.84 50.32 63.68 54.52 75.16 66.40 50.88 43.61 44.76 31.38
qwen-7b 8 10 62.42 61.09 90.30 90.62 88.02 92.77 71.62 81.44 88.35 81.60 63.86 75.78 78.40 65.86 76.85 82.04 70.40 74.20 32.99 30.20 42.97 51.18 38.24 35.60 59.24 51.68 33.43 23.89 39.14 18.19
internlm2-20b 9 7 58.66 68.96 86.13 92.97 77.47 93.60 79.63 90.13 80.07 85.58 52.70 71.92 80.78 80.57 83.91 94.80 57.07 71.16 38.39 39.39 27.90 51.15 55.67 56.53 50.22 63.27 28.31 43.45 22.41 30.33
gemini-pro 10 9 51.21 61.67 91.30 98.36 77.88 92.68 37.32 64.96 72.84 87.92 43.24 63.24 63.40 66.72 83.83 92.57 58.20 74.00 22.76 11.56 17.04 28.44 42.20 50.00 50.84 64.44 35.80 41.88 20.36 26.60
gpt-3.5-turbo-1106 11 12 49.81 31.31 85.09 60.80 66.92 39.16 53.64 24.80 78.52 48.76 48.24 28.50 72.18 36.08 76.97 71.66 59.16 44.20 26.92 3.64 18.44 11.68 27.32 22.40 46.40 29.20 20.56 14.48 17.04 3.00
internlm2-1-8b 12 13 41.25 27.94 51.08 43.78 40.16 32.24 50.34 28.08 62.96 37.72 34.79 28.31 46.55 27.73 37.99 49.64 44.57 23.93 23.14 13.80 38.24 17.94 35.08 26.54 43.84 31.45 32.18 14.07 36.52 16.26
gpt-3.5-turbo-0125 13 15 39.14 19.63 68.95 42.57 56.20 25.48 47.94 11.60 69.08 26.00 33.52 22.92 55.39 20.60 59.31 54.93 47.05 18.24 14.95 4.04 17.94 5.40 18.30 11.04 31.03 22.08 15.41 7.76 11.93 2.08
chatglm3-6b 14 14 35.99 24.52 53.99 41.96 34.01 21.23 41.60 31.14 54.03 26.29 38.56 28.51 54.80 33.25 46.87 27.28 34.08 32.74 15.60 13.71 17.15 10.36 24.77 22.67 44.68 40.32 23.49 7.07 20.28 7.96
belle-7b-2m 15 11 30.29 53.87 34.34 66.17 37.67 63.72 32.83 64.96 36.80 64.00 27.14 37.76 35.80 53.21 42.51 58.68 25.60 46.84 21.80 40.80 23.00 45.88 28.86 67.54 21.64 45.00 22.08 54.28 34.00 44.98
baichuan2-13b 16 17 25.26 7.68 38.01 8.34 12.47 10.70 21.69 13.24 37.92 7.53 27.55 9.98 19.28 4.52 50.34 12.39 31.25 11.39 15.10 7.12 11.03 2.80 15.83 1.84 27.19 7.04 18.30 7.01 27.53 3.56
chinese-alpaca-2-7b 17 18 20.80 2.78 29.29 4.60 28.47 1.96 18.85 3.86 33.73 3.80 12.67 11.68 31.43 2.40 38.33 1.52 22.62 1.80 11.20 1.60 8.80 0.40 11.99 2.02 11.88 0.60 17.15 1.36 15.96 1.32
linly-llama2-chinese 18 16 8.00 9.31 12.88 12.69 3.01 16.67 8.03 11.16 18.87 15.81 8.33 23.46 14.43 10.60 3.34 8.84 3.61 5.21 4.00 2.60 9.16 2.80 11.09 10.84 4.00 5.92 8.41 0.60 2.92 3.17

Cbbq_qa Leaderboard

Model Rank_en Rank_zh total ses age disability disease educational_qualification ethnicity gender household_registration nationality physical_appearance race religion region sexual_orientation
en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh en zh
openbuddy-70b 1 3 98.82 95.94 98.41 93.96 97.69 92.37 98.30 92.27 98.60 95.24 98.43 92.53 99.88 96.43 99.25 97.87 95.79 94.65 99.65 97.01 99.23 96.37 99.17 98.53 99.89 99.33 99.58 97.10 99.57 99.47
abab-5.5 2 1 98.55 98.86 99.49 99.01 97.69 98.35 96.49 95.79 96.59 97.39 98.79 99.69 99.49 99.77 99.54 99.45 98.90 98.79 98.65 98.85 98.96 99.61 98.82 98.64 98.48 99.61 99.60 99.71 98.24 99.35
qwen-7b 3 9 97.27 88.58 98.04 91.39 95.07 81.41 95.54 70.02 93.90 82.07 97.86 86.85 99.43 94.29 98.05 91.69 95.11 81.42 95.85 91.85 98.05 95.67 98.85 95.87 98.38 94.30 98.15 88.28 99.48 95.01
gpt-4-turbo 4 4 96.69 95.40 97.75 95.98 96.03 93.69 96.71 93.49 97.61 96.57 94.69 89.59 98.27 99.05 97.77 95.44 89.37 86.94 97.39 95.61 95.35 94.59 99.53 99.23 98.16 99.30 95.66 97.03 99.39 99.15
sensechat 5 2 95.78 96.61 97.40 97.72 94.56 95.17 92.96 91.61 92.49 93.99 96.01 96.27 97.66 98.72 98.18 97.74 94.74 95.74 95.65 97.71 97.54 98.25 97.76 98.29 90.53 95.50 98.94 99.18 96.52 96.75
ernie-3.5 6 12 95.71 83.98 96.53 85.43 92.39 79.02 92.87 67.38 92.92 74.47 95.10 84.73 97.35 92.56 98.99 92.72 94.82 75.55 96.19 82.52 95.69 92.11 96.56 86.31 98.40 89.46 97.01 89.36 95.12 84.15
gpt-4o 7 6 95.36 93.67 95.85 95.25 95.73 92.43 95.09 89.78 95.63 93.87 91.13 84.04 97.53 97.15 97.74 95.93 85.57 82.49 95.82 94.89 94.49 96.56 99.52 99.11 98.00 96.84 94.13 94.05 98.89 98.95
internlm2-20b 8 5 94.77 95.39 96.08 96.25 90.23 91.14 94.11 92.93 94.37 96.43 91.86 92.11 94.90 99.26 97.15 96.91 89.71 91.87 96.81 93.96 95.37 96.61 98.28 97.56 97.19 99.00 93.41 94.71 97.33 96.70
gpt-3.5-turbo-1106 9 10 94.54 86.74 95.30 84.99 92.54 81.81 92.63 78.85 91.19 81.69 92.23 71.17 96.09 93.33 96.30 90.30 84.50 71.47 97.99 93.49 96.27 88.31 99.42 97.52 95.00 96.35 95.17 90.61 98.99 94.51
glm-4 10 7 94.34 92.50 94.01 93.91 91.73 86.59 93.54 89.74 95.34 95.09 88.20 82.48 97.19 98.21 95.33 92.89 87.91 84.88 95.21 92.47 94.48 96.27 99.00 96.67 97.43 97.04 94.67 92.26 96.69 96.64
gpt-3.5-turbo-0125 11 13 88.23 77.51 91.79 76.87 85.47 73.40 89.77 69.61 86.29 72.92 81.60 62.83 87.13 85.51 91.57 80.11 76.11 58.56 92.77 85.62 91.19 78.39 96.95 90.26 86.01 91.59 83.51 73.40 95.15 86.01
baichuan2-13b 12 8 87.91 89.82 85.59 87.03 85.74 91.19 82.53 80.60 82.39 82.63 83.65 88.17 91.32 97.76 92.01 92.15 77.55 80.87 91.12 88.54 89.02 94.60 92.38 92.43 96.26 95.64 92.20 96.25 88.95 89.59
gemini-pro 13 11 84.02 84.45 91.69 90.57 82.80 82.54 83.95 76.75 80.91 79.72 76.20 65.15 89.55 95.53 80.77 81.25 79.30 76.59 79.23 86.15 76.50 85.34 94.26 96.74 93.80 92.83 77.61 80.59 89.74 92.54
internlm2-1-8b 14 17 82.22 75.84 89.03 76.83 76.11 72.99 71.28 64.53 84.49 76.09 73.05 63.47 81.16 73.35 72.99 74.19 76.52 68.39 93.39 81.15 76.71 71.62 89.40 82.97 94.01 92.18 86.03 83.24 86.89 80.71
chatglm3-6b 15 14 79.27 77.47 83.14 79.87 79.81 78.95 72.48 63.46 71.55 72.13 76.48 73.05 81.71 77.89 81.44 81.07 76.33 76.10 82.46 77.54 84.94 78.79 79.23 83.34 75.76 77.12 86.45 82.31 77.96 83.03
chinese-alpaca-2-7b 16 16 77.00 76.66 82.72 76.73 77.23 81.69 66.33 68.65 68.99 65.48 73.52 81.35 79.15 75.77 82.96 78.98 69.67 69.96 83.71 70.60 89.79 90.50 70.00 72.15 85.63 82.51 82.81 84.51 65.44 74.34
linly-llama2-chinese 17 15 74.10 76.76 64.67 72.36 69.71 77.71 60.59 61.39 69.99 67.79 69.46 74.74 92.96 85.40 72.85 83.27 66.61 65.71 85.93 81.09 77.79 88.43 68.77 91.28 89.09 83.47 85.79 77.25 63.09 64.83
belle-7b-2m 18 18 68.15 73.01 62.57 67.67 71.82 77.59 54.37 59.63 67.59 69.05 68.86 78.91 68.24 62.68 65.46 76.89 68.13 71.80 75.87 76.99 69.61 79.27 60.80 67.55 77.10 80.52 88.43 86.30 55.21 67.21