spaCy Model for Column auto mapping #5434
vagarwal77
started this conversation in
Help: Best practices
Replies: 1 comment
-
From your question, it is also not entirely clear to me which NLP challenge you're trying to tackle. Automatically mapping field names in databases feels like a "dangerous" thing to do only with NLP. In your example, a connection is made between database fields that have the same name (and presumably the same type) but that doesn't seem to require NLP. Either way I think you'll need to implement a custom algorithm suited for your use-case. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am working on a healthcare project where we have different dataset schemas from different providers. Each datasets have app. 100 tables along with each table have app. 20 columns so in total abut 100 * 20 = 2000 entities.
We need to map these 2000 entities to a standard Common Data Model
Manually performing these mapping tasks are very tedious and error prone hence, I would like to use the semantics (Column name, column datatype. Length or column description) to map the columns from both data sources automatically.
I am trying to develop a semantic model to auto map the columns between 2 tables based upon column name.
I have sample mapping data for training purpose like below -
Algorithm I am looking is -
Find the column to column mappings based upon train datasets as below -
df = pd.read_csv(mapping, sep=',',
usecols=['src_column', 'src_table', ’src_column_length', 'src_column_type','src_column_desc', 'dest_column',
'dest_table', ’dest_column_length', 'dest_column_type','dest_column_desc'], encoding='utf8')
I can also feed the 1000 records from the table to provide more inside with the column contents
Now, if I provide 2 distinct tables with columns, I would like model to provide me the mapping between columns form both the tables.
Please suggest which model we need to load -
should I use nlp = spacy.load("en")
Any code sample to achieve above would be great to achieve the goal. I am using Python3
Beta Was this translation helpful? Give feedback.
All reactions