UTAS-UMAP

nishio UMAP visualization from data from the 2022 Upper House election by the University of Tokyo’s Taniguchi Lab, Communist Island (Communist Island) and so on. image - Joint research by Taniguchi Laboratory, University of Tokyo and Asahi Shimbun (UTAS) / UMAP

nishio Communal island and nearby mixed islands image

nishio Self-supporting island, public island and a large mixed island nearby. image

nishio Restoration Island image

image imageimage imageimageimage

from pPolis2024-05-27 pPolis

UTAS UMAP

image

image

py

import umap

# Dimensionality reduction by UMAP
reducer = umap.UMAP(
    n_neighbors=15, min_dist=0.1, n_components=2, random_state=42, n_jobs=1
)
embedding = reducer.fit_transform(matrix)


# Create a DataFrame containing the UMAP results and include the matrix index (User_ID)
umap_df = pd.DataFrame(embedding, columns=["UMAP1", "UMAP2"])
umap_df["User_ID"] = matrix.index # index of matrix as User_ID

# Add User_ID column to data, then merge with umap_df
data = data.reset_index().rename(
    columns={"index": "User_ID"}
) # Add index as User_ID
merged_data = pd.merge(data, umap_df, on="User_ID", how="inner") # Merge by User_ID

# Add column of party names
merged_data["PARTY_NAME"] = merged_data["PARTY"].map(political_parties)
data = merged_data

# Prepare color map
unique_parties = data["PARTY"].unique()
colors = plt.cm.tab10(np.linspace(0, 1, len(unique_parties)))

parties_letter = {
    "Liberal Democratic Party": "自", "自", "自", "自", "自", "自", "自
    "Constitutional Democratic Party of Japan": "standing", "standing".
    "Komeito": "ko", "ko", "ko", "ko", "ko", "ko", "ko
    "Japan Restoration Association": "維", "restoration".
    "National Democratic Party": "country", "country".
    "Communist Party": "ĺ…±", "ĺ…±", "ĺ…±", "ĺ…±", "ĺ…±", "ĺ…±", "ĺ…±
    "Social Democratic Party": "Sha", "Sha".
    "Reiwa Shinsengumi": "Re", "Re".
    "NHK Party": "N",.
    "suffragettes": "suffragettes", "suffragettes".
    "various": "various", "various", "various", "various
    "Independent Candidate": "No", "No".
}


# Visualization
plt.figure(figsize=(10, 7))
for i, party in enumerate(unique_parties):
    party_data = data[data["PARTY"] == party]
    pname = political_parties[party]
    pletter = parties_letter[political_parties[party]]
    plt.scatter(
        party_data["UMAP1"],
        party_data["UMAP2"],
        s=0,
        marker="+",
        color=colors[i],
        label=f"{pname}({pletter})",
    )
    for p in party_data.iterrows():
        plt.text(
            p[1]["UMAP1"],
            p[1]["UMAP2"],
            s=pletter,
            color=colors[i],
            fontsize=5,
            ha="center",
            va="center",
        )

plt.title("Political Parties UMAP Visualization")
plt.xlabel("UMAP1")
plt.ylabel("UMAP2")
plt.legend(title="political party")
plt.show()

Next, it would be interesting to apply Polis-like cluster feature extraction to these clusters, and explain AI clusters in the manner of a public opinion map.


Regarding the Polis-like visualization of the 2022 council elections, I used to do it with PCA, and the other day I had AI explain that principal component axis to me, and it wasn’t very interesting. Based on that, I thought, “Well, it’s okay if I can’t find meaning in the axis, so I can visualize it in a different way,” and here is what I tried using UMAP, which is used in Talk to the City. It’s kind of split like that.

It’s interesting that there are “radical Communists” on a remote islet who are gathered only by the Communists and “mild Communists” who live together with other parties. So are the people. Who are the LDP people who are mixed in with the remote isles of the Restoration (if you look it up, of course)?

I was thinking of having the AI explain it by clustering the islands after UMAP, but if they are separated like this, it might make more sense to have a human cluster the islands that are clearly separated by specifying the range and clustering only the large islands into a few islands.


To add context, this data, which was only divided into two parts by PCA of 600 people, was divided into six parts by UMAP, so this was a preliminary experiment to see if UMAP could be used to create a detailed map that divides the public opinion map into two parts.


Select a range from Scatter and display where it is selected py

# Function to display data in a selection
def on_select(eclick, erelease):
    x1, y1 = eclick.xdata, eclick.ydata # Start point of selection
    x2, y2 = erelease.xdata, erelease.ydata # End point of selection

    # Show selections and data
    print(
        f"Selected range: x=[{min(x1, x2):.2f}, {max(x1, x2):.2f}], y=[{min(y1, y2):.2f}, {max(y1, y2):.2f}]"
    ) 

from matplotlib.widgets import RectangleSelector

# Monitor selection events using RectangleSelector
fig, ax = plt.subplots()
selector = RectangleSelector(ax, on_select, useblit=True, button=[1], interactive=True)

# Visualization
# plt.figure(figsize=(10, 7))
for i, party in enumerate(unique_parties):
    party_data = data[data["PARTY"] == party]
    pname = political_parties[party]
    pletter = parties_letter[political_parties[party]]
    ax.scatter(
        party_data["UMAP1"],
        party_data["UMAP2"],
        s=0,
        marker="+",
        color=colors[i],
        label=f"{pname}({pletter})",
    )
    for p in party_data.iterrows():
        ax.text(
            p[1]["UMAP1"],
            p[1]["UMAP2"],
            s=pletter,
            color=colors[i],
            fontsize=5,
            ha="center",
            va="center",
        )

plt.title("Political Parties UMAP Visualization")
plt.xlabel("UMAP1")
plt.ylabel("UMAP2")
plt.legend(title="political party")


plt.show()

from Black Boxes and Digging In image

  • image
  • I’m working on Polisian Analysis (Polis without the real-time scatter plots) in one code right now, but now that I understand the parts, I’ll break it down.
  • Given a set of clusters, a component that produces a characteristic opinion that separates the in’s and out’s of the clusters.
    • The figure depicts the Fisher exact test, but that is not the exact name.
      • In technical terms Fisher’s exact probability test.
      • The intention of using the components is completely different.
        • We are not testing whether they are different clusters, but rather we are diverting the scale used to test the cluster to the scale used to select the questions that characterize the cluster.
    • Maybe we don’t have the right words yet to describe the concept because we haven’t treated this part as a component.
  • I want to cut out this part of the component so that it can be used independently.

In the same way, if we UMAP the poll map data and give it a party icon, we might be able to identify opinion clusters that are not represented by a political party…


2024-10-31

nishio The first one, the second one is the characteristic opinion vector that this range of legislators have. imageimage nishio I’m talking about the AI not only clustering on its own, but also allowing humans to specify “this range” for analysis. nishio That means we’ve made it this far. image

I’m curious to see what the difference of opinion is between the pure communist island further to the upper left and the rest of this cluster.

  • Mathematically, it should be possible…
    • Technically…
  • I was frustrated (or at least it wasn’t something I could choosy about).

I gave up and did individual clusters, which surprisingly accomplished my goal.

I left the results here. - UMAP visibility for the 2022 Upper House election


This page is auto-translated from /nishio/pPolis2024-10-29 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.