《Table 3 Example of the feature template with context window n》

《Table 3 Example of the feature template with context window n》   提示:宽带有限、当前游客访问压缩模式
本系列图表出处文件名:随高清版一同展现
《Automatic extraction and structuration of soil–environment relationship information from soil survey reports》


  1. 获取 高清版本忘记账户?点击这里登录
  1. 下载图表忘记账户?点击这里登录
1) Wi is the current word feature,Wi–n represents the nth word feature in front of the current word,Wi–1 is the previous word feature,Wi+1is the next word feature,Wi+n represents the nth word feature behind the current word,and Wi–1/Wi is a feature

The above process was accomplished mainly through Chinese word segmentation(e.g.,“20°以上” (over 20degree)is segmented to“20,°,以上(over)”) ,part-of-speech tagging(e.g.,“20[number],°[quantifier]”),prefix and suffix matching(e.g.,“以上 (over)[suffix]”) and rule matching(e.g.,“20[number]+°[quantifier]+以上 (over)[suffix]”matches with the rule:“number+quantifier+suffix”) .Chinese word segmentation and part-of-speech tagging can be carried out with free software tools such as NLTK(http://www.nltk.org),Stanford NLP(http://stanfordnlp.github.io/CoreNLP),and LTP(http://github.com/HIT-SCIR/ltp).Prefix and suffix matching need a dictionary of prefixes and suffixes.The establishment of a prefix and suffix dictionary is as follows:1) Manually summarize the prefix and suffix words that appear in the variable values in terms of semantics and 2) expand them using the synonym dictionary.Then,combine the adjacent tags,including the prefix,number,punctuation,quantifier and suffix,into a sequence(e.g.,“20[number],°[quantifier],以上 (over)[suffix]”is combined into“number+quantifier+suffix”) and extract the potential variable value by matching the sequence with rules in a rule base.To get diverse types of variable values,a rule base needs to be established.The rule base can be generalized manually according to variable value instances.Examples of the rules are shown in Table 1.Several rules may be matched in the base;therefore,among the potential variable values,the best variable value remains to be picked out.The maximum matching method was used to start the match from the left side and select the longest match in a match set.Ultimately,the longest one was taken as the most suitable soil variable value.