Vector search for [/plurality-japanese
prev
- Improvement of /plurality-japanese/vector-search.
reading
- [/omoikane/Omoikane Embed into other projects](https://scrapbox.io/omoikane/Omoikane Embed into other projects).
$ git clone https://github.com/nishio/omoikane-embed-core plurality-japanese-embed
$ pip install -r requirements.txt
ModuleNotFoundError: No module named 'distutils'
Ensure distutils is Installed: distutils is included with the standard library for Python versions prior to 3.10. For Python 3.10 and later, distutils has been deprecated and is not included by default. If youβre using Python 3.10 or later, consider using setuptools instead for package management and distribution. Ha, I see. When I made it, it was 3.10, now itβs 3.12.
It worked with various modifications.
- The openai library itself has a different interface in 1.0.
- https://github.com/nishio/plurality-japanese-embed/commit/9302de05d31c734d2b047d42607206638192bae7
- I also reduced it to omoikane-embed-core.
:
% python make_vecs_from_json/main.py
processing 769 pages
100%|ββββββββββββββββββββββββββ| 769/769 [00:05<00:00, 139.02it/s]
total tasks: 7470, 0.0% was cached
processing 7470 tasks in 150 batches
100%|ββββββββββββββββββββββββββ| 150/150 [06:29<00:00, 2.60s/it]
upload :
% python upload_vecs/main.py
uploading plurality-japanese.pickle
100%|ββββββββββββββββββββββββββ| 74/74 [00:24<00:00, 3.06it/s]
OK
before/after
Experiment with blocksize=100
- Developing views in parallel while waiting for results
result :
% python make_vecs_from_json/main.py
processing 769 pages
100%|ββββββββββββββββββββββββββ| 769/769 [00:03<00:00, 224.82it/s]
total tasks: 19866, 13.4% was cached
processing 17205 tasks in 345 batches
100%|ββββββββββββββββββββββββββ| 345/345 [12:19<00:00, 2.14s/it]
% python upload_vecs/main.py
uploading plurality-japanese.pickle
100%|ββββββββββββββββββββββββββ| 239/239 [01:18<00:00, 3.05it/s]
OK
About $0.36 for a smaller chunk run.
view
% git clone https://github.com/nishio/omoikane-vecsearch plurality-vecsearch-ja
% npm install
- I did audit fix βforce and returned it to omoikane-vecsearch.
% npm run dev
- and make sure you can search properly locally.
% git remote rename origin upstream
% git remote add origin https://github.com/nishio/plurality-vecsearch-ja.git
% git branch -M main
% git push -u origin main
Open the Vercel dashboard
I was able to build and deploy, but I donβt see the search target project set up.
- https://plurality-vecsearch-ja.vercel.app/
- I think itβs supposed to be put in the Vercel environment variable.
before / after after hmm Well, we can improve this place later.
Release!
What we did today from [/plurality-japanese/vector-search-improvement
- Create a separate service with only β Japanese language
- Chunk Improvements
- β Include β chunks of 100 tokens as well as the 500 tokens that have been used so far.
- Only β 1 chunk of hits from 1 page
- Adding Data
- β 1: First, this Scrapbox
Vector search results for βvector searchβ
- βVector Searchβ and other matches are also found.
2024-04-04
Fix issue with GitHub Actions not working
.
build Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/setup-python@v4. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/. actions/checkout: Action for checking out a repo actions/setup-python: Set up your GitHub Actions workflow with a specific version of Python
:
The conflict is caused by:
The user requested protobuf==5.26.1
grpcio-tools 1.62.1 depends on protobuf<5.0dev and >=4.21.6
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
This page is auto-translated from /nishio/pVectorSearch2024-04-02 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iβm very happy to spread my thought to non-Japanese readers.