related: Analyze Polis data

Experiment to analyze data not derived from Polis

Find the previous source code

Changing upstream data

  • This data is not Polis-derived, but 0/1 data, with verticals for users and horizontals for questions; unlike Polis, there is no missing data.
  • The cluster was easily formed. image

Fix error where it gives details not just images.

diff

- for k in range(n_voters):
+ for k in range(len(clusters_K_star)):

This is a bug I didn’t notice last time the data was too small, the error occurs when n_voters exceeds 100.

  • n_voters == 112, len(clusters_K_star) == 100

KeyError: 0

  • gpt.icon - Confirmation of data frame structure: - Check the columns and indexes of the matrix to understand in what format the data is stored. - Execute print(matrix.columns) and print(matrix.index) to check its structure. - Looking at the output results, the matrix columns are... String - Cause of KeyError: The original code `matrix[c].iloc[idx_g]` treats c as an integer, when in fact c must be a string.
  • Read the code
    • statements_all_in = range(8)
      • This, by the way, was a messy hard code last time.
    • Change to statements_all_in = matrix.columns.
    • The phenomenon that is occurring is py
for c in range(N_comments):
   matrix[c]  # error here
- So:.

py

comment = statements_all_in[c]  # comment id
matrix[comment]  # OK
- is of the correct form

The output results are, well, convincing.

N=112

  • python t.py 2.28s user 6.43s system 558% cpu 1.560 total
  • image N=1116
  • python t.py 2.96s user 5.94s system 613% cpu 1.449 total
  • imageimage N=2231
  • python t.py 8.20s user 9.98s system 803% cpu 2.261 total N=4462
  • python t.py 7.22s user 9.72s system 730% cpu 2.318 total N=11153
  • python t.py 13.42s user 20.98s system 617% cpu 5.574 total
  • image N=111526
  • python t.py 19.73s user 22.84s system 724% cpu 5.879 total
  • image

You can analyze 110,000 cases of data in about 30 seconds.

2024-05-14

Last time, we used 18 different questions, “Does this matter?” only used 0/1, but this time we will convert the multiple choice for each question to One Hot

I don’t usually use Panads, but I can easily write “things that can be written with for statements, but are a pain if they occur frequently”. gpt.icon py

df = pd.DataFrame(data)
# Filter columns containing 'selectedIndex'
filtered_df = df.filter(like='selectedIndex')
# Create dummy variable (0/1 column)
df_dummies = pd.get_dummies(df, columns=selectedIndex_cols, prefix=selectedIndex_cols)

Easy One Hot Encoding of answers to multiple-choice questions

111526 rows x 74 columns

  • This is still about 40 seconds.

Whether there are 1,000 or 110,000 cases, the structure is almost the same

1000 items

image

110,000

image

After all, large data tends to be in two clusters.

N=110,000, K=3

. It’s not funny, so I limited the clusters to three, and this is what I got. image Commentary by GPT Based on the voting patterns and interests of each group, we suggest the following group names

  • Group 0: Cautious Evaluators
    • Rationale: This group consistently provides neutral or mild feedback to the “neither” or “0” option on questions about economic policy and security law. It represents a cautious and unbiased viewpoint.
  • Group 1: Pragmatic Supporters
    • Rationale: This group tends to support government initiatives such as tax increases, security legislation, and relocation of military bases. They demonstrate a realistic acceptance of government actions. Their voting patterns are significantly skewed toward positive or supportive responses.
  • Group 2: Constitutional Conservatives
    • Rationale: This group’s concerns relate to adherence to the Constitution and a cautious stance toward military intervention and nuclear energy. It maintains constitutional norms against changes in state policy and expresses a skeptical perspective on the military role and the expansion of nuclear energy. It reflects a cautious and conservative approach. These names are intended to reflect the voting patterns and primary concerns of each group and focus on responses to specific policy questions without assuming ideological motivations.

nishio.iconimpressions

  • Commentary adequately reflects the data.
  • “You just put ‘neither’ in the middle of a dichotomy!” I thought.

Forced 5-way split

N=110,000, K=5

. image gpt.iconBelow are the appropriate group names based on the different groups’ responses to the Japanese political questions:

  • Group 0: Progressive Reformers
    • This group strongly opposes security legislation, including the right to collective self-defense, takes a position of not appreciating Abenomics, and supports an immediate phase-out of nuclear power. The group also opposes the relocation of the U.S. military base in Okinawa to Henoko, Nago City, and opposes the revision of Article 9 of the Constitution. These positions demonstrate a progressive and reformist approach that values fidelity to the Constitution and is cautious about military and nuclear policies.
  • Group 1: Conservative Guardians
    • Members of this group generally support security legislation that strengthens Japan’s defense capabilities, appreciate Abenomics, support the relocation of U.S. military bases in Okinawa, and support amending the Constitution to clearly state the Self Defense Forces. They also support the promotion of tourism policies, including casinos, and are conservative, defense-oriented, and focused on economic growth.
  • Group 2: Cautious Moderates
    • This group is somewhat more moderate in its support for Abenomics and security legislation than Group 1, supporting initiatives such as casinos and base relocation, but cautious about fiscal policies such as tax increases. They have a balanced view that weighs potential benefits and social impacts.
  • Group 3: Indifferent Skeptics
    • They appear to be skeptical or difficult to make decisions on major issues, with noticeably neutral reactions on nuclear electric policy, and are not positive about major changes or strong opinions. This may reflect an attitude of indifference or low engagement.
  • Group 4: Critical observers
    • This group is highly critical of current policies such as Abenomics and security legislation and resists the relocation of U.S. military bases. Their critical stance on key economic and defense policies and their deep observation and skepticism of government actions are evident.

These labels are intended to capture the general trends and sentiments of each group based on their voting patterns and responses to key policy questions in Japan.

prev Polis2024-05-08


This page is auto-translated from /nishio/Polis2024-05-13 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.