Gopher Knowledge Re-engineering
Downloading Gopher Knowledge from Revelation
Burke has created a process that dumps all Gopher tables into a bunch of binary DOS files. I then use a Java tool to parse those DOS files into a PostgreSQL database as ID-BLOB records. Finally, I use Gunther's trick w/ Saxon to generate XML files from the PostgreSQL database from an XML template. So, all Gopher data from the development server (not all production patient data) makes it into PostgreSQL. Currently only a few XML templates exist, namely for the the TERM tables.
Burke has spent quite some sweat over trying a better pipe between Gopher & RDB worlds, but with little luck. We could evolve this data base export technique toward an automated process.
Uploading Gopher Dictionary to Relational Database
Gunther has taken Burke's XML export and mingled it through his tools to quickly upload parts of XML data into Oracle. Just a few tables exist right now to support Linas' and Gunther's explorations into the drug set terms in Gopher. Our goal is to compare the drug terms with NDF-RT and eventually to enhance Gopher knowledge from NDF-RT. Anyway, the data presently is on Gunther's Oracle instance to which Linas and others have access (you can have access too.)
We have one Gopher concept table:
CREATE TABLE UMLS.GOPHER_CONCEPT (
ID NUMBER NOT NULL,
NAME VARCHAR2(256) NULL,
TYPE VARCHAR2(16) NULL,
SOURCE VARCHAR2(64) NULL,
CLASS VARCHAR2(16) NULL,
UNITS VARCHAR2(64) NULL,
PREGNANCY_RISK VARCHAR2(1) NULL,
PRIMARY KEY(ID)
)
And another with concept relationships, also called "role" in the literature. There are many such relationships in the Gopher somewhere, but we presently only picked 2, i.e., set membership and interaction relationship, and have not quite cleaned those up:
CREATE TABLE UMLS.GOPHER_ROLE (
TYPE_ID VARCHAR2(64) NULL,
SOURCE_ID NUMBER NULL,
TARGET_ID NUMBER NULL,
EXTRA_ID NUMBER NULL
)
For set membership this table is easy, SOURCE_ID is the member and TARGET_ID is the set. For interacts_with relationship, we find that the interaction relationship relates the EXTRA_ID (should be source), to the TARGET_ID. These are usually (not always, for good reasons) set terms. The SOURCE_ID seems to be redundant, it is just a member of the set referenced by EXTRA_ID.
Because Gopher sets can contain other sets, and one frequently wants to find all the elementary members of a set, we use the standard transitive closure approach along the lines of the Warshall algorithm. I.e., we first prepare a transitive set table:
DROP TABLE UMLS.GOPHER_SET_PLUS GO CREATE TABLE UMLS.GOPHER_SET_PLUS AS SELECT SOURCE_ID, TARGET_ID, 1 AS DISTANCE FROM UMLS.GOPHER_ROLE WHERE TYPE_ID = 'set'
3328 record(s) affected
and then repeat the induction step
INSERT INTO UMLS.GOPHER_SET_PLUS
SELECT R1.SOURCE_ID, R2.TARGET_ID, R1.DISTANCE+R2.DISTANCE
FROM UMLS.GOPHER_SET_PLUS R1
INNER JOIN UMLS.GOPHER_SET_PLUS R2
ON(R2.SOURCE_ID = R1.TARGET_ID)
WHERE NOT EXISTS (SELECT 1 FROM UMLS.GOPHER_SET_PLUS X
WHERE X.SOURCE_ID = R1.SOURCE_ID
AND X.TARGET_ID = R2.TARGET_ID)
until it yields no new rows. Which happens quickly in only 3 steps:
- 1646 record(s) affected
- 1293 record(s) affected
- 0 record(s) affected
