related: Analyze Polis data
Experiment to analyze data not derived from Polis
- This page is an experimental log
- See the explanation page here: Opinion cluster analysis of 110,000 people.
Find the previous source code
- I guess I didnât publish it.
- Because itâs not so much a repository.
- I put it here: https://gist.github.com/nishio/094743f24cf0ccf6af346505d35f4846
Changing upstream data
- This data is not Polis-derived, but 0/1 data, with verticals for users and horizontals for questions; unlike Polis, there is no missing data.
- The cluster was easily formed.
Fix error where it gives details not just images.
diff
- for k in range(n_voters):
+ for k in range(len(clusters_K_star)):
This is a bug I didnât notice last time the data was too small, the error occurs when n_voters exceeds 100.
n_voters == 112, len(clusters_K_star) == 100
KeyError: 0
- - Confirmation of data frame structure: - Check the columns and indexes of the matrix to understand in what format the data is stored. - Execute print(matrix.columns) and print(matrix.index) to check its structure. - Looking at the output results, the matrix columns are... String - Cause of KeyError: The original code `matrix[c].iloc[idx_g]` treats c as an integer, when in fact c must be a string.
- Read the code
statements_all_in = range(8)
- This, by the way, was a messy hard code last time.
- Change to
statements_all_in = matrix.columns
. - The phenomenon that is occurring is py
for c in range(N_comments):
matrix[c] # error here
- So:.
py
comment = statements_all_in[c] # comment id
matrix[comment] # OK
- is of the correct form
The output results are, well, convincing.
N=112
- python t.py 2.28s user 6.43s system 558% cpu 1.560 total
- N=1116
- python t.py 2.96s user 5.94s system 613% cpu 1.449 total
- N=2231
- python t.py 8.20s user 9.98s system 803% cpu 2.261 total N=4462
- python t.py 7.22s user 9.72s system 730% cpu 2.318 total N=11153
- python t.py 13.42s user 20.98s system 617% cpu 5.574 total
- N=111526
- python t.py 19.73s user 22.84s system 724% cpu 5.879 total
You can analyze 110,000 cases of data in about 30 seconds.
2024-05-14
Last time, we used 18 different questions, âDoes this matter?â only used 0/1, but this time we will convert the multiple choice for each question to One Hot
I donât usually use Panads, but I can easily write âthings that can be written with for statements, but are a pain if they occur frequentlyâ. py
df = pd.DataFrame(data)
# Filter columns containing 'selectedIndex'
filtered_df = df.filter(like='selectedIndex')
# Create dummy variable (0/1 column)
df_dummies = pd.get_dummies(df, columns=selectedIndex_cols, prefix=selectedIndex_cols)
Easy One Hot Encoding of answers to multiple-choice questions
111526 rows x 74 columns
- This is still about 40 seconds.
Whether there are 1,000 or 110,000 cases, the structure is almost the same
1000 items
110,000
After all, large data tends to be in two clusters.
N=110,000, K=3
. Itâs not funny, so I limited the clusters to three, and this is what I got. Commentary by GPT Based on the voting patterns and interests of each group, we suggest the following group names
- Group 0: Cautious Evaluators
- Rationale: This group consistently provides neutral or mild feedback to the âneitherâ or â0â option on questions about economic policy and security law. It represents a cautious and unbiased viewpoint.
- Group 1: Pragmatic Supporters
- Rationale: This group tends to support government initiatives such as tax increases, security legislation, and relocation of military bases. They demonstrate a realistic acceptance of government actions. Their voting patterns are significantly skewed toward positive or supportive responses.
- Group 2: Constitutional Conservatives
- Rationale: This groupâs concerns relate to adherence to the Constitution and a cautious stance toward military intervention and nuclear energy. It maintains constitutional norms against changes in state policy and expresses a skeptical perspective on the military role and the expansion of nuclear energy. It reflects a cautious and conservative approach. These names are intended to reflect the voting patterns and primary concerns of each group and focus on responses to specific policy questions without assuming ideological motivations.
impressions
- Commentary adequately reflects the data.
- âYou just put âneitherâ in the middle of a dichotomy!â I thought.
Forced 5-way split
N=110,000, K=5
. Below are the appropriate group names based on the different groupsâ responses to the Japanese political questions:
- Group 0: Progressive Reformers
- This group strongly opposes security legislation, including the right to collective self-defense, takes a position of not appreciating Abenomics, and supports an immediate phase-out of nuclear power. The group also opposes the relocation of the U.S. military base in Okinawa to Henoko, Nago City, and opposes the revision of Article 9 of the Constitution. These positions demonstrate a progressive and reformist approach that values fidelity to the Constitution and is cautious about military and nuclear policies.
- Group 1: Conservative Guardians
- Members of this group generally support security legislation that strengthens Japanâs defense capabilities, appreciate Abenomics, support the relocation of U.S. military bases in Okinawa, and support amending the Constitution to clearly state the Self Defense Forces. They also support the promotion of tourism policies, including casinos, and are conservative, defense-oriented, and focused on economic growth.
- Group 2: Cautious Moderates
- This group is somewhat more moderate in its support for Abenomics and security legislation than Group 1, supporting initiatives such as casinos and base relocation, but cautious about fiscal policies such as tax increases. They have a balanced view that weighs potential benefits and social impacts.
- Group 3: Indifferent Skeptics
- They appear to be skeptical or difficult to make decisions on major issues, with noticeably neutral reactions on nuclear electric policy, and are not positive about major changes or strong opinions. This may reflect an attitude of indifference or low engagement.
- Group 4: Critical observers
- This group is highly critical of current policies such as Abenomics and security legislation and resists the relocation of U.S. military bases. Their critical stance on key economic and defense policies and their deep observation and skepticism of government actions are evident.
These labels are intended to capture the general trends and sentiments of each group based on their voting patterns and responses to key policy questions in Japan.
prev Polis2024-05-08
This page is auto-translated from /nishio/Polis2024-05-13 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iâm very happy to spread my thought to non-Japanese readers.