text-indexing

Data sources

Pubmed

Download from Pubmed, use pubmed24n1219.xml.gz for demo.

Unzip it into ./data-source/pubmed24n1219.xml

Temp test

yarn tsx src/es-test.ts

Description

The image provides a overview of the system architecture. Let's break it down:

Data Sources:
- Pubmed XML
- Twitter JSON
Parsing:
- XML Parser for Pubmed
- JSON Parser for Twitter
Separate parsers for each data source ensure proper handling of different formats.
Structured Data:

The parsed data is converted into a unified structured format, including fields for Doc, Meta, Twitter, and Pubmed.
Worker:

Processes the structured data, performing tasks such as text normalization, entity extraction, etc.
Document Services:
- Create Doc Service
- Get Doc Service These services handle document creation, and retrieval.
Search Services:
- Search Services
- Index Doc Service
Handles search and index requests and interacts with the Elasticsearch component.
Elasticsearch:

The core search engine, with configuration for analyzers, tokenizers, and mapping.
Controller:

Manages incoming search requests from the client.

Potential areas for consideration:

Data validation and error handling are not explicitly shown
Caching mechanism for frequent searches is not visible
No visible load balancing for high traffic scenarios
Security measures are not depicted (e.g., authentication, authorization)
Monitoring and logging components are not shown

Folder structure

src/
├── parsers/
│   ├── PubmedParser.ts
│   └── TwitterParser.ts
├── services/
│   ├── CreateDocService.ts
│   ├── IndexDocService.ts
│   ├── GetDocService.ts
│   └── SearchService.ts
├── models/
│   ├── Doc.ts
│   ├── Meta.ts
│   └── StructuredData.ts
├── workers/
│   └── DataProcessor.ts
├── controllers/
│   └── SearchController.ts
├── utils/
│   └── types.ts
├── config/
│   └── elasticsearch.ts
└── index.ts

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.vscode		.vscode
data-source		data-source
db		db
design-ref		design-ref
docker-compose		docker-compose
logs		logs
src		src
.editorconfig		.editorconfig
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
.yarnrc.yml		.yarnrc.yml
README.md		README.md
components.json		components.json
docker-compose.yml		docker-compose.yml
drizzle.config.ts		drizzle.config.ts
jest.config.js		jest.config.js
next.config.mjs		next.config.mjs
package.json		package.json
postcss.config.mjs		postcss.config.mjs
renovate.json		renovate.json
setup.log		setup.log
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text-indexing

Data sources

Pubmed

Temp test

Description

Folder structure

About

Releases

Packages

Languages

jjasoncool/text-indexing

Folders and files

Latest commit

History

Repository files navigation

text-indexing

Data sources

Pubmed

Temp test

Description

Folder structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages