Gopher Knowledge Re-engineering

Downloading Gopher Knowledge from Revelation

Burke has created a process that dumps all Gopher tables into a bunch of binary DOS files. I then use a Java tool to parse those DOS files into a PostgreSQL database as ID-BLOB records. Finally, I use Gunther's trick w/ Saxon to generate XML files from the PostgreSQL database from an XML template. So, all Gopher data from the development server (not all production patient data) makes it into PostgreSQL. Currently only a few XML templates exist, namely for the the TERM tables.

Burke has spent quite some sweat over trying a better pipe between Gopher & RDB worlds, but with little luck. We could evolve this data base export technique toward an automated process.

Uploading Gopher Dictionary to Relational Database

Gunther has taken Burke's XML export and mingled it through his tools to quickly upload parts of XML data into Oracle. Just a few tables exist right now to support Linas' and Gunther's explorations into the drug set terms in Gopher. Our goal is to compare the drug terms with NDF-RT and eventually to enhance Gopher knowledge from NDF-RT. Anyway, the data presently is on Gunther's Oracle instance to which Linas and others have access (you can have access too.)

We have one Gopher concept table:

CREATE TABLE UMLS.GOPHER_CONCEPT ( 
    ID            	NUMBER NOT NULL,
    NAME          	VARCHAR2(256) NULL,
    TYPE          	VARCHAR2(16) NULL,
    SOURCE        	VARCHAR2(64) NULL,
    CLASS         	VARCHAR2(16) NULL,
    UNITS         	VARCHAR2(64) NULL,
    PREGNANCY_RISK	VARCHAR2(1) NULL,
    PRIMARY KEY(ID)
)

And another with concept relationships, also called "role" in the literature. There are many such relationships in the Gopher somewhere, but we presently only picked 2, i.e., set membership and interaction relationship, and have not quite cleaned those up:

CREATE TABLE UMLS.GOPHER_ROLE ( 
    TYPE_ID  	VARCHAR2(64) NULL,
    SOURCE_ID	NUMBER NULL,
    TARGET_ID	NUMBER NULL,
    EXTRA_ID 	NUMBER NULL 
    )

For set membership this table is easy, SOURCE_ID is the member and TARGET_ID is the set. For interacts_with relationship, we find that the interaction relationship relates the EXTRA_ID (should be source), to the TARGET_ID. These are usually (not always, for good reasons) set terms. The SOURCE_ID seems to be redundant, it is just a member of the set referenced by EXTRA_ID.

Because Gopher sets can contain other sets, and one frequently wants to find all the elementary members of a set, we use the standard transitive closure approach along the lines of the Warshall algorithm. I.e., we first prepare a transitive set table:

DROP TABLE UMLS.GOPHER_SET_PLUS
GO
CREATE TABLE UMLS.GOPHER_SET_PLUS AS
SELECT SOURCE_ID, TARGET_ID, 1 AS DISTANCE
  FROM UMLS.GOPHER_ROLE
  WHERE TYPE_ID = 'set'

3328 record(s) affected

and then repeat the induction step

INSERT INTO UMLS.GOPHER_SET_PLUS 
SELECT R1.SOURCE_ID, R2.TARGET_ID, R1.DISTANCE+R2.DISTANCE
  FROM UMLS.GOPHER_SET_PLUS R1
    INNER JOIN UMLS.GOPHER_SET_PLUS R2
      ON(R2.SOURCE_ID = R1.TARGET_ID)
  WHERE NOT EXISTS (SELECT 1 FROM UMLS.GOPHER_SET_PLUS X
                      WHERE X.SOURCE_ID = R1.SOURCE_ID
                        AND X.TARGET_ID = R2.TARGET_ID)

until it yields no new rows. Which happens quickly in only 3 steps:

  1. 1646 record(s) affected
  2. 1293 record(s) affected
  3. 0 record(s) affected