Vector search for [/plurality-japanese

prev

reading

$ git clone https://github.com/nishio/omoikane-embed-core plurality-japanese-embed

$ pip install -r requirements.txt ModuleNotFoundError: No module named 'distutils'

Ensure distutils is Installed: distutils is included with the standard library for Python versions prior to 3.10. For Python 3.10 and later, distutils has been deprecated and is not included by default. If you’re using Python 3.10 or later, consider using setuptools instead for package management and distribution. Ha, I see. When I made it, it was 3.10, now it’s 3.12.

It worked with various modifications.

:

% python make_vecs_from_json/main.py
processing 769 pages
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 769/769 [00:05<00:00, 139.02it/s]
total tasks: 7470,  0.0% was cached
processing 7470 tasks in 150 batches
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 150/150 [06:29<00:00,  2.60s/it]

image

upload :

% python upload_vecs/main.py 
uploading plurality-japanese.pickle
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 74/74 [00:24<00:00,  3.06it/s]
OK

before/after imageimage

Experiment with blocksize=100

  • Developing views in parallel while waiting for results

result :

% python make_vecs_from_json/main.py
processing 769 pages
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 769/769 [00:03<00:00, 224.82it/s]
total tasks: 19866,  13.4% was cached
processing 17205 tasks in 345 batches
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 345/345 [12:19<00:00,  2.14s/it]
% python upload_vecs/main.py        
uploading plurality-japanese.pickle
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 239/239 [01:18<00:00,  3.05it/s]
OK

image

image About $0.36 for a smaller chunk run.

view

% git clone https://github.com/nishio/omoikane-vecsearch plurality-vecsearch-ja

% npm install

  • I did audit fix β€”force and returned it to omoikane-vecsearch.
  • % npm run dev
    • and make sure you can search properly locally.

% git remote rename origin upstream % git remote add origin https://github.com/nishio/plurality-vecsearch-ja.git % git branch -M main % git push -u origin main

Open the Vercel dashboard image

I was able to build and deploy, but I don’t see the search target project set up.

before / after imageimage after image hmm Well, we can improve this place later.

Release!

What we did today from [/plurality-japanese/vector-search-improvement

  • Create a separate service with only βœ… Japanese language
  • Chunk Improvements
    • βœ…Include βœ…chunks of 100 tokens as well as the 500 tokens that have been used so far.
    • Only βœ…1 chunk of hits from 1 page
  • Adding Data
    • βœ…1: First, this Scrapbox

Vector search results for β€œvector search”

  • β€œVector Search” and other matches are also found.

2024-04-04

Fix issue with GitHub Actions not working

.

build Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-python@v4. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/. actions/checkout: Action for checking out a repo actions/setup-python: Set up your GitHub Actions workflow with a specific version of Python

:

The conflict is caused by:
    The user requested protobuf==5.26.1
    grpcio-tools 1.62.1 depends on protobuf<5.0dev and >=4.21.6

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

diff

image

imageimage image

image


This page is auto-translated from /nishio/pVectorSearch2024-04-02 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.