[WINDOWS] Pre-Processing Multiple LocalDocs Respositories - and Moving 'State' File and Folders between Installations? #3164
Replies: 1 comment
-
Hello. Looking into the localdocs_v2/v3 database, you'll see that the names themselves of collections and files are associated with indices and each of the respective filenames, which means that collections named the same and containing the same files as on the "source" computer would have to exist on the "destination" computers, because a process may be going on that Needs to access those specific field values; don't know the whats and whys and hows because that process is not explained - neither is explained why the preprocessed info cannot be reused by only accessing the text sequences that are stored in the database (which is logically the only essential info needed during RAG, instead of 10^4 instances of the same filename where 10^4 text sequences were extracted from, like the database contains now). So, as the RAG here is somehow dependent on the filenames and names of collections, due to (database design + code for retrieving info from it), there is no way of reusing the preprocessed info on a computer where those filenames and collections do not exist. This appears to not have been a concern while designing/architecturing the program (because) - it is supposed to be for Local use in utmost privacy, literally aimed at 1 computer only: so now we have that fixed-form from the beginning, difficult to change but with clumsy workarounds, a fixed frame on top of which functionality is being added, up to the point of this scaffolding staying set in stone for eternity without any possibility or will to make it flexible. You may want to take a look at that database with the LocalDocs Inspector that I've built. |
Beta Was this translation helpful? Give feedback.
-
Hi all -
Fantastic project! I'm only a day into this but with my 2080 Super I feel as though I literally have a cloud-equivalent model to play with - offline!
Question and thinking: in the scenario where I want to "pre-process" (defined as "ingesting, indexing and embedding) LocalDocs on one computer along with the folders of (say, for example) PDF and DOC, can I later install to a fresh Windows instance GPT4ALL and then move the 'config' or 'state' files that represent those indices and the embedding, along with the 'source' LocalDocs folders with PDF and DOC, to another computer - and have it work?
After I do quite a bit of ingestion (probably 250GB!) of materials, my thought was to prepare this on another computer as a backup system, and not have to wait the days/weeks for the indexing/embedding to finish.
As an example, I processed 600MB of PDF files the other night (i9-12xxx, 64GB, fast nvme, 2080 super) and it took nearly 12 hours for 115000 embeddings across ~1200 files.
I could prepare all of this in the cloud, I suppose, and quickly 'pre-process' there, but then I still have the same question: how to move to another computer?
thanks!
Beta Was this translation helpful? Give feedback.
All reactions