概要から特定のキーワードを検索してPubMedIDとセットでアウトプットする。
【設定】
Python 2.7.3
(NOTE: Python 2.4 didn't have xml.etree)
【準備】
- abstract.xml: xml data (see previous blog)
- words.txt : list of keywords
------------------------PYTHON--------------------------------
### get abstract from xml file
from xml.etree.ElementTree import *
tree = parse("abstract.xml")
elem = tree.getroot()
dict=[]
for e in elem.getiterator("PubmedArticle"):
dict.append({
"pmid": e.find("MedlineCitation").findtext("PMID"),
"abst": e.findtext(".//AbstractText", "ND")
})
### count number of papers
>>> len(dict)
2
f=open('words.txt','r')
keyword=f.read().splitlines()
### extract
result = []
for i in range(0,2):
for word in dict[i].get('abst').split():
if word in keyword:
result.append({
"pmid" : dict[i].get('pmid'),
"hit" : word
})
>>> result
[{'pmid': '23245335', 'hit': 'database'}, {'pmid': '23245335', 'hit': 'database'}, {'pmid': '23197657', 'hit': 'database'}]
### OUTPUT TO CSV
import csv
with open('uniq_result.csv', 'wb') as f:
w = csv.writer(f, quoting=csv.QUOTE_ALL)
w.writerow(uniq_result)
-------------------------------------------------------------
0 コメント:
コメントを投稿