Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepKE-cnSchema开箱即用版给出的离谱关系抽取结果 #522

Open
Daniel-ChenJH opened this issue Jun 6, 2024 · 8 comments
Open
Labels
question Further information is requested

Comments

@Daniel-ChenJH
Copy link

你好,我按here中的指示完成了DeepKE-cnSchema开箱即用版的环境配置。对于一些简短的句子,往往只有两个实体、关系词明确的情况下,程序可以正确预测其中的关系。但是对于稍长、实体略多的情况,这个关系抽取给出的结果比较离谱,而且置信度还挺高。

这里是我predict.yaml的内容,其中两个模型都用的是readme中Bert-wwm的模型。

text: '林忠钦,男,汉族,1957年12月6日出生,浙江省宁波市人,机械工程专家,中国工程院院士,教育部科技委先进制造学部主任。曾任上海交通大学校长、党委副书记。'
nerfp: '/home/cjh/domain/DeepKE/example/ner/standard/checkpoints'
refp: '/home/cjh/domain/DeepKE/example/re/standard/re_bert.pth'

这里句子稍长,我们来看看模型得到了什么,下面是log:

[2024-06-06 14:41:26,132][__main__][INFO] - {'YAS': '人物', 'TOJ': '影视作品', 'NGS': '目', 'QCV': '生物', 'OKB': 'Number', 'BQF': 'Date', 'CAR': '国家', 'ZFM': '网站', 'EMT': '网络小说', 'UER': '图书作品', 'QEE': '歌曲', 'UFT': '地点', 'GJS': '气候', 'SVA': '行政区', 'ANO': 'Text', 'KEJ': '历史人物', 'ZDI': '学校', 'CAT': '企业', 'GCK': '出版社', 'FQK': '书籍', 'BAK': '音乐专辑', 'RET': '城市', 'QZP': '景点', 'QAQ': '电视综艺', 'ZRE': '机构', 'TDZ': '作品', 'CVC': '语言', 'PMN': '学科专业'}
[2024-06-06 14:41:26,133][__main__][INFO] - {'祖籍': 'https://cnschema.openkg.cn/item/%E7%A5%96%E7%B1%8D', '父亲': 'https://cnschema.openkg.cn/item/%E7%88%B6%E4%BA%B2/11999902#viewPageContent', '总部地点': 'https://cnschema.openkg.cn/item/%E6%80%BB%E9%83%A8', '出生地': 'https://null.url.here', '目': 'https://cnschema.openkg.cn/item/%E7%9B%AE/7874912#viewPageContent', '面积': 'https://cnschema.openkg.cn/item/%E9%9D%A2%E7%A7%AF', '简称': 'https://cnschema.openkg.cn/item/%E7%AE%80%E7%A7%B0', '上映时间': 'https://null.url.here', '妻子': 'https://cnschema.openkg.cn/item/%E5%A6%BB%E5%AD%90/52626', '所属专辑': 'https://cnschema.openkg.cn/item/%E4%B8%93%E8%BE%91', '注册资本': 'https://cnschema.openkg.cn/item/%E6%B3%A8%E5%86%8C%E8%B5%84%E6%9C%AC', '首都': 'https://cnschema.openkg.cn/item/%E9%A6%96%E9%83%BD/26194', '导演': 'https://cnschema.openkg.cn/item/%E5%AF%BC%E6%BC%94/307826', '字': 'https://cnschema.openkg.cn/item/%E5%AD%97%E5%8F%B7', '身高': 'https://cnschema.openkg.cn/item/%E8%BA%AB%E9%AB%98', '出品公司': 'https://cnschema.openkg.cn/item/%E5%87%BA%E5%93%81%E4%BA%BA', '修业年限': 'https://cnschema.openkg.cn/item/%E4%BF%AE%E4%B8%9A%E5%B9%B4%E9%99%90', '出生日期': 'https://cnschema.openkg.cn/item/%E7%94%9F%E6%97%A5/2690537', '制片人': 'https://cnschema.openkg.cn/item/%E5%88%B6%E7%89%87%E4%BA%BA/18127#viewPageContent', '母亲': 'https://cnschema.openkg.cn/item/%E6%AF%8D%E4%BA%B2/5511#viewPageContent', '编剧': 'https://cnschema.openkg.cn/item/%E7%BC%96%E5%89%A7/1705096', '国籍': 'https://cnschema.openkg.cn/item/%E5%9B%BD%E7%B1%8D', '海拔': 'https://cnschema.openkg.cn/item/%E6%B5%B7%E6%8B%94/5754', '连载网站': 'https://null.url.here', '丈夫': 'https://cnschema.openkg.cn/item/%E4%B8%88%E5%A4%AB/4404', '朝代': 'https://cnschema.openkg.cn/item/%E6%9C%9D%E4%BB%A3', '民族': 'https://cnschema.openkg.cn/item/%E6%B0%91%E6%97%8F/665', '号': 'https://cnschema.openkg.cn/item/%E5%AD%97%E5%8F%B7', '出版社': 'https://cnschema.openkg.cn/item/%E5%87%BA%E7%89%88%E7%A4%BE', '主持人': 'https://cnschema.openkg.cn/item/%E4%B8%BB%E6%8C%81%E4%BA%BA/4690681', '专业代码': 'https://cnschema.openkg.cn/item/%E6%99%AE%E9%80%9A%E9%AB%98%E7%AD%89%E5%AD%A6%E6%A0%A1%E6%9C%AC%E7%A7%91%E4%B8%93%E4%B8%9A%E7%9B%AE%E5%BD%95?fromtitle=%E4%B8%93%E4%B8%9A%E4%BB%A3%E7%A0%81&fromid=7911485', '歌手': 'https://cnschema.openkg.cn/item/%E6%AD%8C%E6%89%8B/16693#viewPageContent', '作词': 'https://cnschema.openkg.cn/item/%E4%BD%9C%E8%AF%8D', '主角': 'https://cnschema.openkg.cn/item/%E4%B8%BB%E8%A7%92/32402', '董事长': 'https://cnschema.openkg.cn/item/%E8%91%A3%E4%BA%8B%E9%95%BF/356514', '成立日期': 'https://cnschema.openkg.cn/item/%E5%85%AC%E5%8F%B8%E6%88%90%E7%AB%8B/7163008', '毕业院校': 'https://cnschema.openkg.cn/item/%E9%99%A2%E6%A0%A1', '占地面积': 'https://cnschema.openkg.cn/item/%E5%8D%A0%E5%9C%B0%E9%9D%A2%E7%A7%AF', '官方语言': 'https://cnschema.openkg.cn/item/%E5%AE%98%E6%96%B9%E8%AF%AD%E8%A8%80', '邮政编码': 'https://cnschema.openkg.cn/item/%E9%82%AE%E6%94%BF%E7%BC%96%E7%A0%81', '人口数量': 'https://cnschema.openkg.cn/item/%E4%BA%BA%E5%8F%A3%E6%95%B0%E9%87%8F', '所在城市': 'https://cnschema.openkg.cn/item/%E5%9F%8E%E5%B8%82/33549', '作者': 'https://cnschema.openkg.cn/item/%E4%BD%9C%E8%80%85/144157', '作曲': 'https://cnschema.openkg.cn/item/%E4%BD%9C%E6%9B%B2', '气候': 'https://cnschema.openkg.cn/item/%E6%B0%94%E5%80%99/384697', '嘉宾': 'https://cnschema.openkg.cn/item/%E5%98%89%E5%AE%BE/963541', '主演': 'https://cnschema.openkg.cn/item/%E4%B8%BB%E6%BC%94', '改编自': 'https://cnschema.openkg.cn/item/%E6%94%B9%E7%BC%96/3495588#viewPageContent', '创始人': 'https://cnschema.openkg.cn/item/%E5%88%9B%E5%A7%8B%E4%BA%BA/36538'}
[2024-06-06 14:41:26,133][pytorch_transformers.modeling_utils][INFO] - loading configuration file /home/cjh/domain/DeepKE/example/ner/standard/checkpoints/config.json
[2024-06-06 14:41:26,134][pytorch_transformers.modeling_utils][INFO] - Model config {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "finetuning_task": "ner",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_labels": 60,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "pruned_heads": {},
  "torchscript": false,
  "type_vocab_size": 2,
  "vocab_size": 21128
}

[2024-06-06 14:41:26,134][pytorch_transformers.modeling_utils][INFO] - loading weights file /home/cjh/domain/DeepKE/example/ner/standard/checkpoints/pytorch_model.bin
[2024-06-06 14:41:28,036][pytorch_transformers.tokenization_utils][INFO] - Model name '/home/cjh/domain/DeepKE/example/ner/standard/checkpoints' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc). Assuming '/home/cjh/domain/DeepKE/example/ner/standard/checkpoints' is a path or url to a directory containing tokenizer files.
[2024-06-06 14:41:28,037][pytorch_transformers.tokenization_utils][INFO] - loading file /home/cjh/domain/DeepKE/example/ner/standard/checkpoints/vocab.txt
[2024-06-06 14:41:28,037][pytorch_transformers.tokenization_utils][INFO] - loading file /home/cjh/domain/DeepKE/example/ner/standard/checkpoints/added_tokens.json
[2024-06-06 14:41:28,037][pytorch_transformers.tokenization_utils][INFO] - loading file /home/cjh/domain/DeepKE/example/ner/standard/checkpoints/special_tokens_map.json
[2024-06-06 14:41:28,037][pytorch_transformers.tokenization_utils][INFO] - loading file /home/cjh/domain/DeepKE/example/ner/standard/checkpoints/tokenizer_config.json
[2024-06-06 14:41:28,053][__main__][INFO] - 林忠钦,男,汉族,1957年12月6日出生,浙江省宁波市人,机械工程专家,中国工程院院士,教育部科技委先进制造学部主任。曾任上海交通大学校长、党委副书记。
[2024-06-06 14:41:28,183][__main__][INFO] - [('林', 'B-YAS'), ('忠', 'I-YAS'), ('钦', 'I-YAS'), ('汉', 'B-ANO'), ('族', 'I-ANO'), ('1', 'B-BQF'), ('9', 'I-BQF'), ('5', 'I-BQF'), ('7', 'I-BQF'), ('年', 'I-BQF'), ('1', 'I-BQF'), ('2', 'I-BQF'), ('月', 'I-BQF'), ('6', 'I-BQF'), ('日', 'I-BQF'), ('浙', 'B-UFT'), ('江', 'I-UFT'), ('省', 'I-UFT'), ('宁', 'I-UFT'), ('波', 'I-UFT'), ('市', 'I-UFT'), ('中', 'B-CAR'), ('国', 'I-CAR')]
[2024-06-06 14:41:28,184][__main__][INFO] - {'林忠钦': '人物', '汉族': 'Text', '1957年12月6日': 'Date', '浙江省宁波市': '地点', '中国': '国家'}
[2024-06-06 14:41:28,184][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:30,245][__main__][INFO] - "林忠钦" 和 "汉族" 在句中关系为:"民族",置信度为1.00。
[2024-06-06 14:41:30,258][__main__][INFO] - {
  "@context": {
    "民族": "https://cnschema.openkg.cn/item/%E6%B0%91%E6%97%8F/665"
  },
  "@id": "林忠钦",
  "民族": {
    "@id": "汉族"
  }
}
[2024-06-06 14:41:30,259][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:31,145][__main__][INFO] - "林忠钦" 和 "1957年12月6日" 在句中关系为:"出生日期",置信度为1.00。
[2024-06-06 14:41:31,145][__main__][INFO] - {
  "@context": {
    "出生日期": "https://cnschema.openkg.cn/item/%E7%94%9F%E6%97%A5/2690537"
  },
  "@id": "林忠钦",
  "出生日期": {
    "@id": "1957年12月6日"
  }
}
[2024-06-06 14:41:31,146][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:31,440][__main__][INFO] - "林忠钦" 和 "浙江省宁波市" 在句中关系为:"出生地",置信度为0.99。
[2024-06-06 14:41:31,441][__main__][INFO] - {
  "@context": {
    "出生地": "https://null.url.here"
  },
  "@id": "林忠钦",
  "出生地": {
    "@id": "浙江省宁波市"
  }
}
[2024-06-06 14:41:31,441][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:31,731][__main__][INFO] - "林忠钦" 和 "中国" 在句中关系为:"国籍",置信度为0.99。
[2024-06-06 14:41:31,731][__main__][INFO] - {
  "@context": {
    "国籍": "https://cnschema.openkg.cn/item/%E5%9B%BD%E7%B1%8D"
  },
  "@id": "林忠钦",
  "国籍": {
    "@id": "中国"
  }
}
[2024-06-06 14:41:31,732][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:32,005][__main__][INFO] - "汉族" 和 "1957年12月6日" 在句中关系为:"出生日期",置信度为0.97。
[2024-06-06 14:41:32,005][__main__][INFO] - {
  "@context": {
    "出生日期": "https://cnschema.openkg.cn/item/%E7%94%9F%E6%97%A5/2690537"
  },
  "@id": "汉族",
  "出生日期": {
    "@id": "1957年12月6日"
  }
}
[2024-06-06 14:41:32,006][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:32,272][__main__][INFO] - "汉族" 和 "浙江省宁波市" 在句中关系为:"民族",置信度为0.55。
[2024-06-06 14:41:32,273][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:32,537][__main__][INFO] - "汉族" 和 "中国" 在句中关系为:"民族",置信度为0.55。
[2024-06-06 14:41:32,537][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:32,846][__main__][INFO] - "1957年12月6日" 和 "浙江省宁波市" 在句中关系为:"出生日期",置信度为0.97。
[2024-06-06 14:41:32,847][__main__][INFO] - {
  "@context": {
    "出生日期": "https://cnschema.openkg.cn/item/%E7%94%9F%E6%97%A5/2690537"
  },
  "@id": "1957年12月6日",
  "出生日期": {
    "@id": "浙江省宁波市"
  }
}
[2024-06-06 14:41:32,847][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:33,219][__main__][INFO] - "1957年12月6日" 和 "中国" 在句中关系为:"出生日期",置信度为0.92。
[2024-06-06 14:41:33,220][__main__][INFO] - {
  "@context": {
    "出生日期": "https://cnschema.openkg.cn/item/%E7%94%9F%E6%97%A5/2690537"
  },
  "@id": "1957年12月6日",
  "出生日期": {
    "@id": "中国"
  }
}
[2024-06-06 14:41:33,220][preprocess][INFO] - use bert tokenizer...
[2024-06-06 14:41:33,526][__main__][INFO] - "浙江省宁波市" 和 "中国" 在句中关系为:"出生地",置信度为0.84。
[2024-06-06 14:41:33,527][__main__][INFO] - {
  "@context": {
    "出生地": "https://null.url.here"
  },
  "@id": "浙江省宁波市",
  "出生地": {
    "@id": "中国"
  }
}

可以看到,这里实体提取分割的结果很好,准确的完成了五个实体({'林忠钦': '人物', '汉族': 'Text', '1957年12月6日': 'Date', '浙江省宁波市': '地点', '中国': '国家'})的分割。但是关系抽取很离谱,到后面输出“ "浙江省宁波市" 和 "中国" 在句中关系为:"出生地",置信度为0.84。”、“"汉族" 和 "浙江省宁波市" 在句中关系为:"民族",置信度为0.55。”、“"1957年12月6日" 和 "浙江省宁波市" 在句中关系为:"出生日期",置信度为0.97。”等等,详见log,并且这些错误的关系提取有时候还有非常高的置信度,甚至有点像将实体胡乱组合的结果。

我不知道怎么样才能让这个模型对于关系预测更加准确、不会出现这些奇怪的关系,有人可以提供任何帮助吗?感谢!

@zxlzr
Copy link
Contributor

zxlzr commented Jun 6, 2024

您好,模型训练的时候是基于一句话一个三元组方式训的,您可以先进行实体识别,然后对同一个句子组成1个实体对+句子输入进行多次关系分类,这样可以减轻一下上述问题。如果您想端到端抽取也可以直接使用OneKE http://oneke.openkg.cn/

@zxlzr zxlzr added the question Further information is requested label Jun 6, 2024
@Daniel-ChenJH
Copy link
Author

对同一个句子组成1个实体对+句子输入进行多次关系分类

感谢回复!我明白了,所以当前模型只能处理一句话两个实体+一个关系的简单句对吧?

另外,您的意思是two-stage,对一个复杂句提取得到多个实体后,将这些实体两两组合、一次次进行关系抽取是吗?比如上面的句子五个实体,那么就需要进行C5^2=10次关系抽取,然后取置信度高的几个?

我比较好奇当前版本模型对于多个实体的关系抽取是怎么做的,看您的意思的话,当前并不是两两组合一个个判断得到的?

@zxlzr
Copy link
Contributor

zxlzr commented Jun 6, 2024

1 对的
2 对的
3 应该直接多个输入后,模型遍历后进行抽取(开发太久了有点忘记细节了),目前多实体关系抽取能力有限,您可以考虑根据类型过滤掉这些非常离谱的错误,您也可以自己标注一些数据进行二次训练优化下效果可能会好一些。

@Daniel-ChenJH
Copy link
Author

谢谢!我采用了您的方法,对"林忠钦,男,汉族,1957年12月6日出生,浙江省宁波市人,机械工程专家,中国工程院院士,教育部科技委先进制造学部 主任。曾任上海交通大学校长、党委副书记。"这句话单独进行两个实体的关系抽取工作,想问下这里为什么会输出“"1957年12月6日" 和 "浙江省宁波市" 在句中关系为:"出生日期",置信度为0.97。”呢?

我应该怎么理解这个结果?1、模型理解正确,没有管浙江省宁波市,而单独把"1957年12月6日" 解读为了出生日期,这跟原句子是相符的;2、模型理解错误,将"1957年12月6日" 理解为浙江省宁波市的出生日期?

类似的情况也在选择“汉族”、“1957年12月6日”两个实体进行关系抽取工作时发生了。[2024-06-06 16:57:01,502][main][INFO] - "汉族" 和 "1957年12月6日" 在句中关系为:"出生日期",置信度为0.97。

对于这种有点无厘头的两个实体之间的关系抽取,我理解应该是生成一个置信度很低的不明确关系才对?

(deepke) cjh@DESKTOP-NERJPTB:~/domain/DeepKE/example/re/standard$ python predict.py
/home/cjh/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/core/utils.py:207: UserWarning:
Using config_path to specify the config name is deprecated, specify the config name via config_name
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/config_path_changes
  warnings.warn(category=UserWarning, message=msg)
/home/cjh/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/plugins/config_source.py:190: UserWarning:
Missing @package directive hydra/output/custom.yaml in file:///home/cjh/domain/DeepKE/example/re/standard/conf.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/adding_a_package_directive
  warnings.warn(message=msg, category=UserWarning)
/home/cjh/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/plugins/config_source.py:190: UserWarning:
Missing @package directive preprocess.yaml in file:///home/cjh/domain/DeepKE/example/re/standard/conf.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/adding_a_package_directive
  warnings.warn(message=msg, category=UserWarning)
/home/cjh/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/plugins/config_source.py:190: UserWarning:
Missing @package directive train.yaml in file:///home/cjh/domain/DeepKE/example/re/standard/conf.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/adding_a_package_directive
  warnings.warn(message=msg, category=UserWarning)
/home/cjh/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/plugins/config_source.py:190: UserWarning:
Missing @package directive embedding.yaml in file:///home/cjh/domain/DeepKE/example/re/standard/conf.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/adding_a_package_directive
  warnings.warn(message=msg, category=UserWarning)
/home/cjh/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/plugins/config_source.py:190: UserWarning:
Missing @package directive predict.yaml in file:///home/cjh/domain/DeepKE/example/re/standard/conf.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/adding_a_package_directive
  warnings.warn(message=msg, category=UserWarning)
/home/cjh/anaconda3/envs/deepke/lib/python3.8/site-packages/hydra/plugins/config_source.py:190: UserWarning:
Missing @package directive model/lm.yaml in file:///home/cjh/domain/DeepKE/example/re/standard/conf.
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/adding_a_package_directive
  warnings.warn(message=msg, category=UserWarning)
/home/cjh/anaconda3/envs/deepke/lib/python3.8/site-packages/omegaconf/basecontainer.py:225: UserWarning: cfg.pretty() is deprecated and will be removed in a future version.
Use OmegaConf.to_yaml(cfg)

  warnings.warn(
cwd: /home/cjh/domain/DeepKE/example/re/standard
use_wandb: false
preprocess: true
data_path: data/origin
out_path: data/out
chinese_split: true
replace_entity_with_type: true
replace_entity_with_scope: true
min_freq: 3
pos_limit: 30
seed: 1
use_gpu: true
gpu_id: 0
epoch: 50
batch_size: 32
learning_rate: 0.0003
lr_factor: 0.7
lr_patience: 3
weight_decay: 0.001
early_stopping_patience: 6
train_log: true
log_interval: 10
show_plot: false
only_comparison_plot: false
plot_utils: matplot
predict_plot: false
use_multi_gpu: false
gpu_ids: 0,1
vocab_size: ???
word_dim: 60
pos_size: 62
pos_dim: 10
dim_strategy: sum
num_relations: 51
fp: /home/cjh/domain/DeepKE/example/re/standard/re_bert.pth
model_name: lm
lm_file: /home/cjh/bert/chinese_wwm_pytorch
num_hidden_layers: 1
type_rnn: LSTM
input_size: 768
hidden_size: 100
num_layers: 1
dropout: 0.3
bidirectional: true
last_layer_hn: true

是否使用范例[y/n],退出请输入: exit .... n
请输入句子:林忠钦,男,汉族,1957年12月6日出生,浙江省宁波市人,机械工程专家,中国工程院院士,教育部科技委先进制造学部 主任。曾任上海交通大学校长、党委副书记。
请输入句中需要预测关系的头实体:1957年12月6日
请输入头实体类型:Date
请输入句中需要预测关系的尾实体:浙江省宁波市
请输入尾实体类型:地点
[2024-06-06 16:53:38,447][deepke.relation_extraction.standard.tools.preprocess][INFO] - use bert tokenizer...
[2024-06-06 16:53:38,466][__main__][INFO] - device: cpu
Some weights of the model checkpoint at /home/cjh/bert/chinese_wwm_pytorch were not used when initializing BertModel: ['bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'cls.predictions.decoder.weight', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.2.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.1.attention.self.query.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'cls.seq_relationship.weight', 'bert.encoder.layer.2.intermediate.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.11.output.dense.weight', 'cls.predictions.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.2.attention.self.value.weight', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.2.attention.output.dense.weight', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.1.attention.output.dense.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.3.attention.self.value.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.1.output.dense.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.2.output.LayerNorm.bias', 'bert.encoder.layer.2.attention.self.query.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.1.attention.output.dense.weight', 'bert.encoder.layer.2.attention.self.query.bias', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.1.attention.self.value.bias', 'bert.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.2.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.2.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.2.output.dense.bias', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.1.attention.self.key.weight', 'bert.encoder.layer.2.attention.output.dense.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.2.attention.self.value.bias', 'bert.encoder.layer.1.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.1.output.dense.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.1.attention.self.value.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.1.attention.self.query.bias', 'bert.encoder.layer.5.output.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.1.attention.self.key.bias', 'bert.encoder.layer.2.attention.self.key.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.2.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.2.output.dense.weight', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.1.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.1.intermediate.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
/home/cjh/anaconda3/envs/deepke/lib/python3.8/site-packages/torch/nn/modules/rnn.py:62: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.3 and num_layers=1
  warnings.warn("dropout option adds dropout after all but last "
[2024-06-06 16:53:38,764][__main__][INFO] - model name: lm
[2024-06-06 16:53:38,764][__main__][INFO] -
 LM(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(21128, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=768, out_features=3072, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): BertOutput(
            (dense): Linear(in_features=3072, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
    )
    (pooler): BertPooler(
      (dense): Linear(in_features=768, out_features=768, bias=True)
      (activation): Tanh()
    )
  )
  (bilstm): RNN(
    (rnn): LSTM(768, 50, batch_first=True, dropout=0.3, bidirectional=True)
  )
  (fc): Linear(in_features=100, out_features=51, bias=True)
  (dropout): Dropout(p=0.3, inplace=False)
)
[2024-06-06 16:53:38,848][__main__][INFO] - "1957年12月6日" 和 "浙江省宁波市" 在句中关系为:"出生日期",置信度为0.97。

@zxlzr
Copy link
Contributor

zxlzr commented Jun 6, 2024

您好,这两个应该都是模型出错,您可以根据简单规则过滤掉这类信息,比如地名和日期不可能有出生于关系。

@zxlzr
Copy link
Contributor

zxlzr commented Jun 12, 2024

请问您还有其他问题吗?

@Daniel-ChenJH
Copy link
Author

暂时没有了,感谢!我后面试试oneke去

@1223243
Copy link

1223243 commented Jul 30, 2024

请问一下,你知道怎么使用vscode调试这个代码吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants