from Diary 2023-06-13 prev pVectorSearch2023-06-07

Make private materials eligible for vector search.

  • I have a feeling we’re going to have an accident.
  • First, put the public flag on the data created from the public Scrapbox.
    • And make the default search from public
    • If you’re going to re-install this data, why not re-crawl it first, even if you don’t add the UPDATE feature in its current state?
  • private is diverse, so make it a separate tag instead of a private flag
  • Display a check box to add records with a specific tag to the search target list only for people with specific permissions.

Add search targets for specific people only

  • Working on authorizing certain people, I want to make it an in-byte code because it would be a mistake for me to work on it.
  • The implementation of this feature is going to be buggy to begin with.

Notion seems to be able to export entire subpages in Markdown.

  • image

If you’re going to re-install this data, why not just re-crawl it first, even if you haven’t added the UPDATE feature to the current situation?

  • âś…Recrawl all together, about 5 minutes. embedding from JSON was not yet batch.
  • Single âś… code Experiment with filtering the search during embedding town
  • I know how to call the API.
  • We need to receive data about the filter from the UI.

Talk about making Notion a search target.

  • I’d like to see you jump to the target when you click on a hit.
  • But unlike Scrapbox, the link destination is not determined by the project and page name.
  • We need to come up with data for the UI.
  • This time, a particular private page is exported, including supppages.
  • If I had the URL of the root page, could I restore it?
    • Oh, I see.
      • local filename
        • Proposal 2: Nishio 7d7c95ceaeb446418bffe462034292bd.md.
      • This part of the ID looks like a worldwide unique ID.
        • You can jump to the destination page with notion.so/7d7c95ceaeb446418bffe462034292bd.
    • Then {type: notion, id: xxx}…

:

processing 15469 pages
100%|...| 15469/15469 [00:53<00:00, 290.87it/s]
processing 13893 tasks in 277 batches)
100%|...| 278/278 [11:27<00:00,  2.47s/it]

Well, if I understand it correctly, this is an implementation that can easily be used for differential updates.

  • Past self, great!

:

# halsk
processing 1281 pages
100%|...| 1281/1281 [00:02<00:00, 619.70it/s]
total tasks: 2221,  1.00% was cached
processing 0 tasks in 0 batches)
0it [00:00, ?it/s]

# yuiseki
processing 2778 pages
100%|...| 2778/2778 [00:09<00:00, 289.72it/s]
total tasks: 4738,  0.92% was cached
processing 375 tasks in 8 batches)
100%|...| 8/8 [00:25<00:00,  3.13s/it]

# tkgshn
processing 5750 pages
100%|...| 5750/5750 [00:11<00:00, 486.96it/s]
total tasks: 12117,  0.97% was cached
processing 308 tasks in 7 batches)
100%|...| 7/7 [00:20<00:00,  2.89s/it]

I was able to update the diff.

I was able to get the code to take the title and ID for each page from the Notion export data.

The rest will be done tomorrow. TODO

  • Creating Data from Notion
  • Create a view of Notion search results
  • Only people with certain privileges get the UI out.
  • Search filtering by checking specific permissions
  • Receive data about filters from the UI
  • Re-enter data into Qdrant.

2023-06-14

  • I thought it would be better to include Discord discussions in the search, so for now, I’ve decided to split the logs by date and put them on Notion this time.
    • Abst of various papers were put into Notion at DeepL
    • Early minutes from Google Docs were also copied and pasted into Notion.

Create data from âś…Notion

  • Try it in Local’s qdrant.
    • âś…Confirmation that it is a hit if it is left as is.
    • Confirm that if you are searching from âś…public only, you will not get any hits. âś…Create a view of the Notion search results
  • You can now click on a search result to jump to either Scrapbox or Notion.

Only people with certain privileges get the UI out.

Search filtering by checking specific permissions

  • When featureA is ON, that information is also passed to the search API. Then, after confirming that the user is authorized, the search is reflected in the filter settings.
  • No, this is a different issue, being able to switch the search target with a checkbox and allowing only those with special privileges to search for special targets.
  • Let’s just do the latter first because it’s too complicated to do all at once.
  • It’s done.

The only thing left to do is to find a way to make it possible to give Custom Claims to specific users without me having to mess with the DB.

  • Take password with prompt and send it to setCustomClaim API
  • Why was I able to set it up last time but not this time?
  • When a user’s new claim is changed in the Admin SDK, it is propagated to the authenticated user on the client side by an identity token as follows

  • The user logs in or re-authenticates after a custom claim change. The resulting identity token issued will contain the latest claim.

  • When an old token expires, its identity token is renewed in the existing user session.

  • The identity token is forced to be updated by calling currentUser.getIdToken(true).

  • Ah, I see, it’s cached.
  • UI aside, it’s done.
  • Custom claims should be added, but now they’re being overwritten, which is good because we only have one right now, but we need to fix it when we do the second one.

image

  • image

Send data to the cloud side

  • All replaced this time.
  • The write operation timed out` even if I put in code to stop indexing and wait 1 second for each batch insert.
  • I changed the code to wait 10 seconds and retry when it fails.
  • image
    • Wow, three bumps in the road at the beginning, but the rest is pretty steady.
    • If so, then the current design, which only sleeps more when it falls, instead of sleeping every time, seems to be correct.

The problem gets worse on iPhone, I figured out how to reproduce it on PC.

  • imageimage
  • Fixed.

Deploying to Vercel environment is rubbish.

  • Forgot to set environment variables
    • Why don’t we just do the Vercel side when we change the .env.local?
  • There was a path that did not pass through Firebase initialization.
    • careless mistake

It worked, and we’re happy.


This page is auto-translated from /nishio/pVectorSearch2023-06-13 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.