Provider Attribution
This appears to be the most expensive and difficult query that we currently do. I (GS) am curious to understand the problem.
I don't even quite know what it is that needs to be done, but I have the code that attempts to do it, and some people have narrated the specification to me. So I start writing down what I know of the specifications. Then I think how to do it in principle based on those specifications. And thirdly I analyze the code that exists presently.
Specification
The system shall list for each patient a single primary care physician who appears to be the primary care provider for that patient.
The system shall take into consideration evidence of Patient Physician interactions. Evidence includes, but is not limited to:
- Encounters
- Orders
- Claims
The system shall weigh that evidence so as to select the most likely primary care physician for the patient.
The system shall be able to laterally suppress evidence based on other evidence. For example, the presence of an inpatient encounter would suppress evidence of other data at the same time. [I am not quite clear why]
The system shall only select a physician who is identified as a "primary care physician" and not a hospitalist.
Sketchy Design
My conceptual model is of course the HL7 v3 RIM. In the RIM all the patient provider interactions are just Acts.
Patient - participation - Act - participation - Provider
Now if we start with just this (I know there are many additional things going on), the right approach is the following:
- with interaction as select patient, provider from join patient - act - provider
- with interaction_tally as select patient, provider, count(1) from interaction group by patient, provider
- with interaction_max as select patient, max(count) from interaction_tally group by patient
- with interaction_winner as select patient, provider from interaction_max inner join interaction_tally using(patient) where count = max
By the time I have explained this in plain English, I have written the SQL:
WITH Interaction AS (
SELECT pat.id AS patientId, doc.id AS providerId
FROM Act act
INNER JOIN Participation pat ON(pat.act_id = act.id)
INNER JOIN Participation doc ON(doc.act_id = act.id)
), InteractionTally AS (
SELECT patientId, providerId, count(1) AS count
FROM Interaction
GROUP BY patientId, providerId
), InteractionMax AS (
SELECT patientId, max(count) as count
FROM InteractionTally
GROUP BY patient
), InteractionWinner AS (
SELECT patientId, providerId
FROM InteractionMax
INNER JOIN InteractionTally using(patientId, count)
)
This is the skeleton. Everything else seems just extra sugar, like:
- constraints and weighing
- constrain the Participation.types (subject, patient, performer, author, etc.)
- constrain the Act.code (encounter, substance administration order, etc.)
- constrain provider role (requires additional join(s))
- add an interaction weighing scheme
- add time based weighing
- include the join through Patient global id
- add lateral inhibition of interactions (if that is really needed)
I think you'd like to drive the constraints as well as the weighing using one approach, because they are really the same thing. If you say, "an inpatient encounter is not counted", you are saying "the weight of the inpatient encounter is zero". The only reason for actually excluding things is in order to cut down on the size of the join.
Let's put some weighing / exclusion thing in using a Weighing parameter table. That table lists all the combinations of the relevant attributes and ascribes a weight to each of them. (It is a minor implementation detail whether you do that with one table or with the cross product of multiple weighing facet tables.)
WITH Interaction AS (
SELECT pat.id AS patientId, doc.id AS providerId, w.value AS weight
FROM Act act
INNER JOIN Participation pat ON(pat.act_id = act.id)
INNER JOIN Participation doc ON(doc.act_id = act.id)
INNER JOIN Weighing w ON(
actClassCode = act.classCode,
actMoodCode = act.moodCode,
actCode = act.code,
patPartTypeCode = pat.typeCode
docPartTypeCode = doc.typeCode
...
)
)
now once we have the weight pulled in, the rest is easy, all we change is turn count(1) count into sum(weight) weight:
WITH Interaction AS (
SELECT pat.id AS patientId, doc.id AS providerId, w.value AS weight
...
), InteractionTally AS (
SELECT patientId, providerId, sum(weight) AS weight
FROM Interaction
WHERE weight > weightCutoff
GROUP BY patientId, providerId
), InteractionMax AS (
SELECT patientId, max(weight) as weight
FROM InteractionTally
GROUP BY patient
), InteractionWinner AS (
SELECT patientId, providerId
FROM InteractionMax
INNER JOIN InteractionTally using(patientId, weight)
)
You can think of the Weighing table as something that can even be calibrated.
Time based weighing would be added by a function which reduces the weight over time. This can be a square-shaped function that is 1 for a certain window of time before now and 0 thereafter. Or it can be a monotonously decreasing function from 1 down to 0. The weighing table can set that, and
WITH Interaction AS (
SELECT pat.id AS patientId, doc.id AS providerId, w.value * apply(w.decayFunction, now() - act.time) AS weight
...
)
so that's fine.
What is hard is the lateral suppression of interactions, i.e., when we start introducing dependency in the weighing algorithm. But then, the nice thing about it is that it is also just another way of doing the weights. And of course we know that adding up weights has a perfect probabilistic interpretation and really is the same idea as Fellegi Sunter. So if we want we can even estimate all that weight stuff using the EM method. But we don't have to, we can assign the weights heuristically.
The patient global index issue is just another pair of joins. No big deal in principle.
Oh, and of course the RMRS is not HL7 v3, so the whole Act and participation stuff will be a bit more complicated. But then it is not really introducing anything fundamentally new, all we do is map the various special classes and timestamps onto Acts like this:
WITH EncounterAct AS (
SELECT 'ENCOUNTER' AS classCode, 'EVN' AS moodCode, encounter_id AS id, encounterTime AS time
FROM Encounter
), OrderAct AS (
SELECT 'ACT' AS classCode, moodCode, medical_order_id AS id, orderTime AS time
), Act AS (
SELECT * FROM EncounterAct
UNION ALL
SELECT * FROM OrderAct
...
)
As a conclusion, unless I don't have all the requirements, I do have here a straight-forward approach to solving the problem in principle. Now (and only now!) we need to look into optimization issues.
Optimizing the Approach
I think it's too early to optimize because we haven't even run a test yet. But to plan a little bit what we might run up against we can do some calculations. The main Interaction table will be pretty big because we will include all Acts, that is, order of magnitude 109 items. Then we need to do a group by on that to have about 108 items in the interaction tally. Then that needs to pick the winner 107 sort and join again to the 108. Well, if that's what we have to do, then that's what we have to do. I don't see how just randomly tweaking things will change that.
Is there a principle which we can use to optimize the approach? I can't tell at this time.
Current Code
Enough from-scratch thought about this problem. Time to look at what we have as the present solution. I wish we could see the initial design. The very first design that this whole intricate code started out with.
We don't have that, so what I will do now is go through the current design commenting freely as I'm trying to make sense of it. I keep the complete unchanged code at the end. And I will copy fragments as I'm analyzing it.
Round 1: Outline
The code is a Oracle module with 5 PL/SQL procedures. So first I just put the procedures here with arguments and description comment, no code.
PROCEDURE RUN_SELECT_PATIENTS(
activity_years IN INT,
report_year IN INT,
report_month IN INT
) ...
Patient search function to perform cohort selection before RUN_PCP_SEARCH (previously part of RUN_PCP_SEARCH).
Parameters:
- activity_years - number of years old activity can be for qualifying patients in cohort. For example, 5 would indicate that anyone with activity in the last 5 years would be included in the patient cohort.
- report_year - target year for reporting PCP activity, full Gregorian year (e.g., 2007)
- report_month - target month for reporting PCP activity, number between 1 and 12 (Jan and Dec).
Lines of code: ~100
PROCEDURE RUN_PCP_SEARCH(
activity_years IN INT,
report_year IN INT,
report_month IN INT
) ...
Main search function to perform entire PCP data gathering and calculations Run RUN_SELECT_PATIENTS first, same parameters as that.
Lines of code: ~300
Given that the remaining procedures are not documented and have quite specific names, I will assume that those are helper methods:
PROCEDURE INPATIENT_LOOKUP(
report_year IN INT,
report_month IN INT
) ...
Used in RUN_PCP_SEARCH.
PROCEDURE PHYSICIAN_LOOKUP(
report_year IN INT,
report_month IN INT
) ...
Used in RUN_PCP_SEARCH.
Now this last one is special, it's called multiple times also from RUN_PCP_SEARCH:
PROCEDURE RUN_ACTIVITY_LOAD(
activity_years IN INT,
report_year IN INT,
report_month IN INT,
target_institution_id IN INT
) ...
Loads data from PARTICIPATION into PCP_ACTIVITY.
Parameters:
- activity_years - same as above
- report_year - same as above
- report_month - same as above
- target_institution_id - institution to copy
Intriguing: PCP_ACTIVITY sounds a lot like "Act". Let's see what it is. Here is some info:
INSERT INTO PCP_ACTIVITY( institution_id, patient_id, participant_provider_id, participant_number, participant_number_sys_id, participation_time, usable_yn, act_type_code
well what is that?
- patientId: composite of course (institution_id, patient_id)
- providerId: (participant_provider_id) not sure if institution_id also factors in
- (participant_number, participant_number_sys_id), some external id?
- Act.time: (participation_time)
- Act.code: (act_type_code)
- Some filtering: usable_yn
O.k. so that means that a lot of work is spent on WRITING Act records from the database just because the database design didn't like the idea of HL7 v3 RIM Acts? What if we could avoid creating all those Act records and instead just queried existing ones? Just a thought.
Round 2: Digging In
We want to dig into the 2 main methods. They are long spaghetti procedures, a joy to look at. But then we can just show the skeleton of each. My main interest is the one that sets up the Act table, and that is the RUN_PCP_SEARCH procedure.
Procedure: RUN_PCP_SEARCH
Let's just outline it to see the basic SQL statements, CURSORs and loops:
There are tons of things like that:
DBMS_OUTPUT.PUT_LINE('... some log message ...');
INSERT INTO preavis.out(msg, insert_time)
VALUES ('... same log message (typed again) ...'
sysDate );
COMMIT;
that does not make the code-gourmet happy. Typing the same string twice for a log message is really a nice way of increasing redundancy, enhancing the possibility of inconsistencies, and it sure a lot of fun for maintaining the code.
But what is more concerning is that we COMMIT here. So whenever there is a log output we COMMIT. Whatever.
The first real step says in comment:
Populate intial physician activity log for patient-physician associations
DELETE FROM PCP_ACTIVITY; COMMIT; PCP.RUN_ACTIVITY_LOAD( activity_years, report_year, report_month, 1 ); PCP.RUN_ACTIVITY_LOAD( activity_years, report_year, report_month, 2 ); PCP.RUN_ACTIVITY_LOAD( activity_years, report_year, report_month, 3 ); ...
and on and on going through a bunch of institutions. That's simple enough to understand (although I have my eyebrows raised.)
The next step says:
Look for a match to a DEFining identifier for a LOCAL_PROVIDER
and goes:
UPDATE PCP_ACTIVITY pal
SET pal.participant_provider_id = (
SELECT pi.person_id
FROM person_identifier pi
WHERE pal.participant_number = pi.identifier
AND pal.participant_number_sys_id = pi.identifier_sys_id
AND pi.identifier_assignment_code = 'DEF'
) WHERE pal.participant_provider_id IS NULL
AND pal.participant_global_id IS NULL
AND pal.usable_yn = 'Y'
AND pal.participant_number IS NOT NULL
AND pal.participant_number_sys_id IS NOT NULL;
COMMIT;
Hmm, an UPDATE is going to cause loads of logging. Reads through everything and assigns the original institution's identifier to that provider_id. Why could this not have been a simple join done when the Act table was first created? And why are we not using the Person table which I thought contained provider ids? Why is this special? Why don't we do the same Person_identifier correction for Patient?
This might make sense only if the usable_yn flag is 'N' most of the time so that we might save a lot of work. But then that could have been part of a join too. Questions, questions.
Next:
Now we look for a match to a REFining identifier for a LOCAL_PROVIDER. Note that we match to an arbitrary MIN(local_provider_id) in this case.
What, "refining"? What is being "refined" here? I thought DEF stands for DEFining and REF for referring? Am I off base or are we so deep in jargon that we forgot to express thought plainly?
UPDATE PCP_ACTIVITY pal
SET pal.participant_provider_id = (
SELECT MIN(pi.person_id)
FROM person_identifier pi
WHERE pal.participant_number = pi.identifier
AND pal.participant_number_sys_id = pi.identifier_sys_id
AND pi.identifier_assignment_code = 'REF'
) WHERE pal.participant_provider_id IS NULL
AND pal.participant_global_id IS NULL
AND pal.usable_yn = 'Y'
AND pal.participant_number IS NOT NULL
AND pal.participant_number_sys_id IS NOT NULL;
COMMIT;
This is almost the same thing as we just saw. Why the duplication? Aha, I assume it is to give precedence to DEFining identifiers and only second precedence to REFerring ones. But is it worth reading over 109 Acts twice just to fill in some missing identifiers?
Next:
Search for global_person_id for the physicians in the activity log
PCP.PHYSICIAN_LOOKUP( report_year, report_month ); COMMIT;
Oh, modular code. I see.
Next:
Mark the PARTICIPATIONS that occured during an INPATIENT visit.
PCP.INPATIENT_LOOKUP( report_year, report_month ); COMMIT;
more modular code.
Next:
Mark the PARTICIPATIONS between excluded doctor/patient combinations.
UPDATE PCP_ACTIVITY a
SET usable_yn = 'N',
not_usable_reason = 'pcp_exclusion'
WHERE EXISTS (
SELECT 1
FROM pcp_exclusion e
INNER JOIN Person p ON(e.patient_global_id = p.global_person_id)
WHERE a.participation_time <= e.effective_date
AND a.participant_global_id=e.physician_global_id
AND a.institution_id=p.institution_id
AND a.patient_id = p.person_id
)
AND usable_yn='Y';
COMMIT;
Hmm, so we really could not do this beforehand either? Every time we run this UPDATE we have to process all those Acts.
Next:
Calculate PCP for each patient, based on activity. First, we calculate PCP based on just last 18 months of activity. Next, we calculate PCP (for anyone not assinged a PCP in the 18 month pass) based on the entire 3 years worth of activity in PCP_ACTIVITY.
Looks like now it's getting hot, could it be that this is our very result table?
INSERT INTO PCP_PROVIDER_SUMMARY (
evaluation_year,
evaluation_month,
global_person_id,
participant_global_id,
participation_count,
participation_day_count,
participation_time_latest,
participation_time_earliest,
provider_freq_ran
)
SELECT report_year,
report_month,
ppl.global_person_id,
pal.participant_global_id,
count(pal.participation_time) as participation_count,
count(distinct pal.participation_time) as participation_day_count,
max(pal.participation_time) as participation_time_latest,
min(pal.participation_time) as participation_time_earliest,
RANK() OVER (
PARTITION BY ppl.global_person_id
ORDER BY count(distinct pal.participation_time) DESC,
max(participation_time) DESC
min(participation_time),
count(pal.participation_time) DESC
) as rank
FROM PCP_ACTIVITY pal
INNER JOIN pcp_patient ppl
ON( ppl.institution_id = pal.institution_id
AND ppl.patient_id = pal.patient_id
AND ppl.evaluation_year = report_year
AND ppl.evaluation_month = report_month)
WHERE pal.participant_global_id IS NOT NULL
AND usable_yn='Y'
AND pal.participation_time >= (RPT_DATE-((365*activity_years)/2))
AND ppl.origin_code = 'PATIENT'
GROUP BY ppl.global_person_id, pal.participant_global_id;
COMMIT;
This feels a lot like that ranking and weighing going on here. And there is a second INSERT statement just like the first immediately after. I assume this could have been an INSERT ... SELECT ... UNION ALL SELECT ... so now
SELECT report_year,
report_month,
ppl.global_person_id,
pal.participant_global_id,
count(pal.participation_time) as participation_count,
count(distinct pal.participation_time) as participation_day_count,
max(pal.participation_time) as participation_time_latest,
min(pal.participation_time) as participation_time_earliest,
RANK() OVER (
PARTITION BY ppl.global_person_id
ORDER BY count(distinct pal.participation_time) DESC,
max(participation_time) DESC,
min(participation_time),
count(pal.participation_time) DESC
) as rank
FROM PCP_ACTIVITY pal
INNER JOIN pcp_patient ppl
ON( ppl.institution_id = pal.institution_id
AND ppl.patient_id = pal.patient_id
AND ppl.evaluation_year = report_year
AND ppl.evaluation_month = report_month)
WHERE pal.participant_global_id IS NOT NULL
AND pal.usable_yn='Y'
AND pal.participation_time >= (RPT_DATE-((365*activity_years)))
AND ( pal.participation_time >= (RPT_DATE-((30)))
AND pal.participation_time < RPT_DATE)
AND ppl.global_person_id NOT IN (
SELECT distinct(pps.global_person_id)
FROM pcp_provider_summary pps
WHERE pps.evaluation_year = report_year
AND pps.evaluation_month = report_month
)
AND ppl.origin_code='PATIENT'
GROUP BY ppl.global_person_id,pal.participant_global_id;
COMMIT;
But we're not done, this one now is the last join of the max to the tally:
Find most commonly occuring physician for each patient
UPDATE PCP_PROVIDER_SUMMARY pps
SET pps.pcp_assigned_yn='Y'
WHERE provider_freq_rank=1
AND evaluation_year=report_year
AND evaluation_month=report_month;
COMMIT;
Hmm, o.k. so that was the gist of it.
I am concluding for today with 2 questions:
- Has this approach been chosen after or before trying a more straight-forward logical design?
- Where are we currently spending most of our resources? Is there any analysis of this as to where most of the CPU and IO is spent?
- Can we see a log file with that output so we can perhaps guess the answers? The logfile would also help with the cardinalities to understand how much data we are shuffling around while creating all these intermediary tables and updating it over and over again.
Procedure: PHYSICIAN_LOOKUP
According to John Hook, this procedure is one of the more costly ones.
This procedure is called once from RUN_PCP_SEARCH main procedure.
To understand that we need to wade through a huge amount of variable declarations:
PROCEDURE PHYSICIAN_LOOKUP(
report_year IN INT,
report_month IN INT
) IS
UPD_CNT NUMBER := 0;
I NUMBER := 0;
LKP_CNT NUMBER := 0;
TARGET_PROVIDER_ID LOCAL_PROVIDER.local_provider_id%TYPE := 0;
PREV_PROVIDER_ID LOCAL_PROVIDER.local_provider_id%TYPE := 0; -- DEAD
PCP_MIN_FLAG LOCAL_PROVIDER.pcp_yn%TYPE;
PCP_MAX_FLAG LOCAL_PROVIDER.pcp_yn%TYPE;
INST_ID LOCAL_PROVIDER.institution_id%TYPE := 0; -- DEAD
PART_TIME PCP_ACTIVITY.participation_time%TYPE; -- DEAD
GLOB_ID PERSON.global_person_id%TYPE := 0;
PREV_GLOB_ID PERSON.global_person_id%TYPE := 0; -- DEAD
TYPE prov_id_list IS TABLE OF
PCP_ACTIVITY.participant_provider_id%TYPE INDEX BY BINARY_INTEGER;
PROV_IDS prov_id_list;
PROV_IDS_CNT NUMBER := 0;
RPT_DATE DATE :=
add_months(to_date(concat(concat(concat(to_char(report_year), '/'),
to_char(report_month)), '/01'), 'yyyy/mm/dd'), 1);
Then comes a bit of comment and we begin seeing some SQL:
This CURSOR is used to read the PCP_ACTIVITY rows where we have identified a provider/doctor but have not yet connected it to the global physician occurences or checked the validity of the doctor for being a PCP.
CURSOR pcp_activity_read IS
SELECT participant_provider_id,participation_time
FROM pcp_activity
WHERE participant_global_id IS NULL
AND participant_provider_id IS NOT NULL
AND usable_yn='Y'
ORDER BY participant_provider_id,participation_time
FOR UPDATE;
BEGIN
The strange thing about the cursor named "pcp_activity_read" is that this is never referenced anywhere. Unless I am completely off base about PL/SQL, this is a piece of dead code. So I won't bother understanding it.
Now the real stuff starts. There are a lot of "SELECT ... INTO ..." calls that stuff data into arrays.
SELECT pa.participant_provider_id
BULK COLLECT INTO PROV_IDS
FROM pcp_activity pa
WHERE pa.participant_provider_id IS NOT NULL
AND pa.usable_yn='Y'
GROUP BY pa.participant_provider_id;
PROV_IDS_CNT := PROV_IDS.COUNT;
and be reminded that:
TYPE prov_id_list IS TABLE OF
PCP_ACTIVITY.participant_provider_id%TYPE INDEX BY BINARY_INTEGER;
PROV_IDS prov_id_list;
PROV_IDS_CNT NUMBER := 0;
So now this is going through all the interaction table to find the providerIds. Why? Aha:
LKP_CNT := 0;
UPD_CNT := 0;
FOR I IN 1..PROV_IDS_CNT
LOOP
TARGET_PROVIDER_ID := PROV_IDS(I);
BEGIN
so we are now looping over each such provider id. Let's see what we're trying to do with them:
SELECT institution_id
INTO INST_ID
FROM local_provider
WHERE local_provider_id = TARGET_PROVIDER_ID;
hmm, isn't LOCAL_PROVIDER_ID the primary key of LOCAL_PROVIDER? If so, why have we not just slapped this as a JOIN when we first built that Interaction table? Also, take note that this INST_ID is used only once, far down and away from here. No idea what the logic is of finding this once here and then using it miles away from here. Minor issue. [Added later: if it wasn't for the fact that this is just dead code for a useless INSERT at the end]
Next:
SELECT p1.global_person_ID,
MIN(lp1.pcp_yn),
MAX(lp1.pcp_yn)
INTO GLOB_ID, PCP_MIN_FLAG, PCP_MAX_FLAG
FROM LOCAL_PROVIDER lp -- WHERE lp.local_provider_id = TARGET_PROVIDER_ID
INNER JOIN PERSON per
ON( per.institution_id = lp.institution_id
AND per.person_id = lp.local_provider_id)
INNER JOIN PERSON p1
ON(p1.global_person_id = per.global_person_id)
LEFT OUTER JOIN LOCAL_PROVIDER lp1
ON( lp1.institution_id = p1.institution_id
AND lp1.local_provider_id = p1.person_id)
WHERE lp.local_provider_id = TARGET_PROVIDER_ID
AND ( lp1.retirement_time IS NULL
OR lp1.retirement_time >= (RPT_DATE-365))
GROUP BY p1.global_person_id;
hmm, this now gets all the provider records for this global person id, and shows what all those records think if the provider is a primary care provider or not. But this query is so peculiarly complex! 3 joins, twice to Person and a left outer join to LOCAL_PROVIDER again.
It looks like we want to look at all the pairs of
Provider><Person [globalPersonId] Person><Provider
that regard the same global person id. But the first is related to the local provider id which is the current item in the big FOR loop.
Why are we doing this? Because it looks like we simply want all the records of this person and see if any of them claims the provider as a primary care provider.
Next it comments:
ELSE try to read the global provider id value
EXCEPTION WHEN NO_DATA_FOUND THEN GLOB_ID := 0; END;
Now that's funny, in the last query there was a LEFT OUTER JOIN and I wondered why that was. The only reason is that we always want something back. But GLOB_ID is NULL if the LEFT OUTER JOIN has nothing to join. So that means the query above could have just been an inner join. Also note that GLOB_ID = 0 surely is trying to use a normal domain value as a NULL flag.
The story evolves thus:
LKP_CNT := LKP_CNT + 1;
IF GLOB_ID = 0 -- IF no global id (should not happen)
THEN
UPDATE pcp_activity
SET usable_yn='N', -- mark activity as unusable in PCP calc
not_usable_reason='id'
WHERE participant_provider_id = TARGET_PROVIDER_ID
AND participant_global_id IS NULL;
what?? We already have a participant_global_id? There seems to be a very simple piece of reasoning all burried in a bunch of nifty codes which depends on intricate understanding of PL/SQL.
Next it says:
ELSE IF physician cannot be a PCP
ELSE IF PCP_MAX_FLAG='N'
OR ( PCP_MAX_FLAG='U'
AND PCP_MIN_FLAG='N')
THEN
UPDATE pcp_activity
SET usable_yn = 'N', -- mark activity unusable for PCP calc
not_usable_reason = 'spec'
WHERE participant_provider_id = TARGET_PROVIDER_ID
AND participant_global_id IS NULL;
ELSE
UPDATE pcp_activity -- save possible PCP's global id
SET participant_global_id = GLOB_ID
WHERE participant_provider_id = TARGET_PROVIDER_ID
AND participant_global_id IS NULL;
INSERT INTO pcp_patient (
evaluation_year,
evaluation_month,
institution_id,
patient_id,
global_person_id,
origin_code
) VALUES (
report_year,
report_month,
INST_ID,
TARGET_PROVIDER_ID,
GLOB_ID,
'LOCAL_PROVIDER'
);
END IF;
END IF;
UPD_CNT := UPD_CNT + 1;
END LOOP; -- the for-each PROV_ID loop
END;
So, here it ends. Hmm, what did all of this actually accomplish?
- Interaction usable_yn was set to N if there is no provider record knows the provider as a primary care provider
- Interaction table gets a provider global person id assigned
- a row is inserted into a table called PCP_PATIENT
it strikes me that the first two points could have easily been done with a join when the Interaction tally is first built. The last bullet is some global side-effect which I have no idea yet how to think of. Whatever PCP_PATIENT is used later to be joined to the final query for building the results relations. But why in the world would we insert a PCP_PATIENT row if we are looking at PROVIDER_id? This totally makes no sense, unless PCP_PATIENT table somehow morphed in purpose. I have tried to confirm that, but I can't. PCP_PATIENT seems to be joined only with patient_id = some patient_id, not with a provider_id. So, sticking a provider_id into patient_id seems dead useless code.
How would I do this work here? I would never want to write code like this, I get high blood pressure and anxiety when I have to touch such code.
The input is PCP_ACTIVITY, the output is PCP_ACTIVITY with modifications. The main decision point is to look for provider records with same global person id and only allow for providers that have at least one PCP_YN set to 'Y' (this is nowhere stated, but all that other indirect stuff makes me conclude that I'm right.) This seems simple enough:
CREATE TABLE PCP_ACTIVITY_2 AS
SELECT act.institution_id,
act.patient_id,
act.participant_provider_id,
act.participant_number,
act.participant_number_sys_id,
act.participation_time,
MAX(usable_yn) AS usable_yn, -- this is a good logical thing to do and works if N and U < Y
act.act_type_code
FROM PCP_ACTIVITY act
INNER JOIN Person thisPerson
ON( thisPerson.person_id = act.participant_provider_id
AND thisPerson.institution_id = act.institution_id)
INNER JOIN Person otherPerson
ON(otherPerson.global_person_id = thisPerson.global_person_id)
INNER JOIN Local_Provider otherProvider
ON( otherProvider.local_provider_id = otherPerson.person_id
AND otherProvider.institution_id = otherPerson.institution_id)
WHERE usable_yn <> 'N'
GROUP BY
act.institution_id,
act.patient_id,
act.participant_provider_id,
act.participant_number,
act.participant_number_sys_id,
act.participation_time,
act.act_type_code
but then also, what's the point of carrying along Interaction records that are marked as "not to be used"? Why don't we just throw them away? In that case it gets really much easier:
CREATE TABLE PCP_ACTIVITY_2 AS
SELECT act.institution_id,
act.patient_id,
act.participant_provider_id,
act.participant_number,
act.participant_number_sys_id,
act.participation_time,
act.act_type_code
FROM PCP_ACTIVITY act
WHERE usable_yn <> 'N'
AND EXISTS (
SELECT 1
FROM Person thisPerson
INNER JOIN Person otherPerson
ON(otherPerson.global_person_id = thisPerson.global_person_id)
INNER JOIN Local_Provider otherProvider
ON( otherProvider.local_provider_id = otherPerson.person_id
AND otherProvider.institution_id = otherPerson.institution_id)
WHERE thisPerson.person_id = act.participant_provider_id
AND thisPerson.institution_id = act.institution_id
AND otherProvider.pcp_yn = 'Y'
)
See, now we don't even need to do a GROUP by. This is also easy enough of a constraint to simply add to all the other filtering of the Interaction table.
So, then again I wonder: how do we know that the complex procedure with the many updates is better than a single query that does this? Of course I would not run this query like that, I would integrate it with all the other filters over the Interaction table.
For today I conclude with two questions:
- please remove the CURSOR definition in the beginning of this procedure, this seems to have no use and might still cause useless database action.
- please remove the INSERT into PCP_PATIENT piece at the end, it seems useless work.
- also remove everything dealing with INST_ID, it too seems useless code
- look at the variable declarations marked -- DEAD above and delete them
None of these two steps will likely make this all blazingly fast, but there is no point of having dead code around when trying to get something to work.
Now that I have rewritten the whole thing into a straight filter query without even GROUP BY left, I know that my approach is better than all that UPDATE. (Recall also that John Hook is proposing similar things: avoid all that UPDATE stuff)
Procedure: INPATIENT_LOOKUP
This puzzle has possessed me now. I can't stop. Moving on with this INPATIENT_LOOKUP procedure.
PROCEDURE INPATIENT_LOOKUP(
report_year IN INT,
report_month IN INT
) IS
TYPE time_list IS
TABLE OF encounter_account.admit_time%TYPE
INDEX BY BINARY_INTEGER;
ADMIT_TIMES time_list;
DISCHARGE_TIMES time_list;
TIME_LIST_CNT NUMBER := 0;
UPD_CNT NUMBER := 0;
I NUMBER := 0;
CNT NUMBER := 0;
LKP_CNT NUMBER := 0;
GLOBAL_ID PERSON.global_person_id%TYPE := 0;
PREV_GLOBAL_ID PERSON.global_person_id%TYPE := 0;
ACTIVITY_TIME ENCOUNTER_ACCOUNT.admit_time%TYPE := NULL;
PA_ROWID UROWID;
and here again a cursor with the first SQL:
CURSOR pcp_act_read is
SELECT pp.global_person_id,
pa.participation_time,
pa.rowid
FROM pcp_activity pa
INNER JOIN pcp_patient pp
ON( pp.evaluation_year = report_year
AND pp.evaluation_month = report_month
AND pp.institution_id = pa.institution_id
AND pp.patient_id = pa.patient_id)
WHERE pa.participant_global_id IS NOT NULL
AND pa.usable_yn = 'Y'
ORDER BY pp.global_person_id, pa.participation_time;
BEGIN
OPEN pcp_act_read;
LOOP
FETCH pcp_act_read
INTO GLOBAL_ID, ACTIVITY_TIME, PA_ROWID;
EXIT WHEN pcp_act_read%NOTFOUND;
CNT := CNT + 1;
so that's just another way of doing the FOR loop from the provider thing above. Our input data is again all Interaction records. Joining with special Patient records. So not only do we copy all Acts, we also have copied all Patients before we start all that expensive work.
Now I don't promote more hacks upon more hacks, but if we need a short term fix, here is what we can do:
- write all that PCP_ACTIVITY and PCP_PATIENT thing as well as the LOCAL_PROVIDERS into an EXTERNAL datafile on the SAN.
- use a 2nd Oracle instance on a separate machine to mount these datafiles and do all expensive work thus freeing up Condor for actual use.
I'll not spend further time on that, because I'm more intesteded in proving that straight-forward SQL will be faster than all that.
So, what is the work here?
IF GLOBAL_ID != PREV_GLOBAL_ID
THEN
IF TIME_LIST_CNT > 0
THEN TIME_LIST_CNT := 0;
ADMIT_TIMES.DELETE;
DISCHARGE_TIMES.DELETE;
END IF;
so there is something about going down an ordered list emulating some sort of group by global id operation. This here is what we want to do once per group:
SELECT trunc(e.admit_time),
DECODE(e.discharge_time,
NULL,
trunc(e.admit_time+1),
trunc(e.discharge_time+1))
BULK COLLECT
INTO ADMIT_TIMES, DISCHARGE_TIMES
FROM encounter_account e
INNER JOIN PCP_PATIENT a
ON( a.evaluation_year = report_year
AND a.evaluation_month = report_month
AND a.global_person_id = GLOBAL_ID
AND a.institution_id = e.institution_id
AND a.patient_id = e.patient_id
WHERE (e.care_setting_sys_id, e.care_setting_code) IN (
SELECT target_sys_id,target_code
FROM concept_link
WHERE source_sys_id = 334
AND source_code = 'INPATIENT_CARE_SETTINGS'
AND type_code = 'EL'
AND type_sys_id = 4
GROUP BY target_sys_id, target_code
)
AND e.admit_time IS NOT NULL;
So we're looking for encounters. But whose encounters? The one of any patient with the current GLOBAL_ID. What kind of encounter? One whose "care_setting" concept is an element in a list (EL) named INPATIENT_CARE_SETTINGS. But I have no idea why we are doing this with an IN subselect and a nested group by. Why not just make a straight forward INNER JOIN:
INNER JOIN concept_link
ON( target_sys_id = e.care_setting_sys_id
AND target_code = e.care_setting_code
AND source_sys_id = 334
AND source_code = 'INPATIENT_CARE_SETTINGS'
AND type_code = 'EL'
AND type_sys_id = 4)
or just to be sure we're not duplicating rows, it should be an EXISTS subselect:
WHERE EXISTS (SELECT 1
FROM concept_link
WHERE target_sys_id = e.care_setting_sys_id
AND target_code = e.care_setting_code
AND source_sys_id = 334
AND source_code = 'INPATIENT_CARE_SETTINGS'
AND type_code = 'EL'
AND type_sys_id = 4
)
that's much clearer.
So what we have now is two parallel arrays of inpatient encounters for each of the patients. And because we're looping over a list of Interactions ordered by global patient id, we naturally want to do this searching for inpatient encounters only once.
Now comes the conclusion of the work that we wanted to do once per group.
TIME_LIST_CNT := ADMIT_TIMES.COUNT;
PREV_GLOBAL_ID := GLOBAL_ID;
LKP_CNT := LKP_CNT + 1;
END IF;
After this END IF, supposedly we are doing things for every record.
IF ACTIVITY_TIME IS NULL
THEN
UPDATE pcp_activity
SET usable_yn = 'N',
not_usable_reason = 'time'
WHERE rowid = PA_ROWID;
UPD_CNT := UPD_CNT + 1;
wait a minute? If there is no ACTIVITY_TIME, say because there was no Encounter for this patient, then we say that the Interaction is not usable? That's not fair. It should just leave things alone. Either this should never happen, or it should just not cause any such UPDATE.
Now here is what happens most of the time I suppose:
ELSE
FOR I IN 1..TIME_LIST_CNT
LOOP
IF ACTIVITY_TIME >= ADMIT_TIMES(I)
AND ACTIVITY_TIME <= DISCHARGE_TIMES(I)
THEN
UPDATE pcp_activity
SET usable_yn='N',
not_usable_reason='inp'
WHERE rowid = PA_ROWID;
UPD_CNT := UPD_CNT + 1;
EXIT;
END IF;
END LOOP;
END IF;
END LOOP;
Oh, that was it? So, this appears to be this lateral inhibition situation. That is, if there is an inpatient encounter contemporaneously with another Interaction, then we will disregard that other Interaction.
Two questions for me:
- how would I simplify this work here,
- how would I implement this lateral inhibition instead?
To the first question: I would consider this again an input/output transformation:
CREATE TABLE PCP_ACTIVITY_2 AS
SELECT act.institution_id,
act.patient_id,
act.participant_provider_id,
act.participant_number,
act.participant_number_sys_id,
act.participation_time,
act.act_type_code
FROM PCP_ACTIVITY act
WHERE usable_yn <> 'N'
AND NOT EXISTS ( -- any inpatient encounter for this patient at the same time
SELECT 1
FROM Person thisPerson -- WHERE thisPerson = act.(patient_id, institution_id)
-- standard 3-hop through global person id to other patient records
INNER JOIN Person otherPerson
ON(otherPerson.global_person_id = thisPerson.global_person_id)
INNER JOIN Patient otherPatient
ON( otherPatient.patient_id = otherPerson.person_id
AND otherPatient.institution_id = otherPerson.institution_id)
-- pick up any encounter
INNER JOIN Encounter_account encounter
ON( encounter.patient_id = otherPatient.patient_id
AND encounter.institution_id = otherPatient.institution_id)
-- pick up the inpatient encounters, notice, we don't care about dupes here, it's NOT EXIST
INNER JOIN concept_link
ON( target_sys_id = e.care_setting_sys_id
AND target_code = e.care_setting_code
AND source_sys_id = 334
AND source_code = 'INPATIENT_CARE_SETTINGS'
AND type_code = 'EL'
AND type_sys_id = 4)
-- arguments last:
WHERE thisPerson.person_id = act.patient_id
AND thisPerson.institution_id = act.institution_id
AND act.time BETWEEN encounter.admission_time AND encounter.discharge_time
)
and that's it. So, while I am talking about "lateral inhibition" here we didn't really care about that. What we did here was just go out to more data and see if something is there that we don't like. So, that's really all it is. Of course these Inpatient encounters are also Acts. I don't understand why these would completely block any other Interactions. Can't I have a visit from my PCP when I am in the hospital? I don't want to spend time on the lateral inhibition if really it may not be necessary.
Anyway, the query above would do it.
Procedure: RUN_SELECT_PATIENTS
So now we just one procedure left, and that's of interest because it defines what all goes into the Interaction table.
PROCEDURE RUN_SELECT_PATIENTS(
activity_years IN INT,
report_year IN INT,
report_month IN INT
) IS
RPT_DATE DATE := add_months(to_date(concat(concat(concat(to_char(report_year), '/'),
to_char(report_month)), '/01'), 'yyyy/mm/dd'), 1);
BEGIN
DELETE FROM pcp_patient
WHERE evaluation_year = report_year
AND evaluation_month = report_month;
why not just drop a table? This DELETE stuff is so expensive!
INSERT INTO PCP_PATIENT (
evaluation_year,
evaluation_month,
institution_id,
patient_id,
global_person_id,
origin_code)
WITH cohort_list AS (
SELECT pat.institution_id,
pat.patient_id,
p.global_person_id,
pat.registration_date,
pat.registration_end_date,
enc.max_admit,
pstat.last_activity_date,
p.origin_code
FROM PERSON p
INNER JOIN PATIENT pat
ON( pat.institution_id=p.institution_id
AND pat.patient_id=p.person_id)
LEFT OUTER JOIN ENCOUNTER_STATISTICS enc
ON( enc.institution_id=pat.institution_id
AND enc.patient_id=pat.patient_id)
LEFT OUTER JOIN PATIENT_STATISTICS pstat
ON( pstat.institution_id=pat.institution_id
AND pstat.patient_id=pat.patient_id)
INNER JOIN (
SELECT p1.global_person_id
FROM PERSON p1
WHERE p1.date_of_birth < (RPT_DATE-180)
AND p1.date_of_birth > (RPT_DATE-(365*100))
AND p1.human_yn='Y'
AND p1.origin_code='PATIENT'
AND p1.institution_id IN (1,2,3,7,9,11,14,102,103,108,109,110)
AND p1.name_last IS NOT NULL
AND p1.name_first IS NOT NULL
AND p1.home_address_zip_code_code IN (
SELECT z.zip_code
FROM zip_code z
WHERE z.zip_code LIKE '46%'
AND z.county_name IN ('BOONE','HAMILTON','HANCOCK', 'HENDRICKS','JOHNSON', 'MADISON','MARION','MORGAN','PUTNAM')
)
MINUS
SELECT p2.global_person_id
FROM PERSON p2
WHERE p2.human_yn='N'
OR p2.date_of_death < (RPT_DATE-180)
) persons
ON(persons.global_person_id = p.global_person_id)
WHERE p.institution_id IN (1,2,3,7,9,11,14,102,103,108,109,110)
)
SELECT report_year,
report_month,
co.institution_id,
co.patient_id,
co.global_person_id,
co.origin_code
FROM cohort_list co
WHERE co.global_person_id IN (
SELECT co1.global_person_id
FROM cohort_list co1
WHERE co1.registration_end_date >= (RPT_DATE-(365*activity_years))
OR co1.last_activity_date >= (RPT_DATE-(365*activity_years))
OR co1.max_admit >= (RPT_DATE-(365*activity_years))
OR co1.registration_date >= (RPT_DATE-(365*activity_years))
);
So that's all. If I ruled I would take the form col1 IN (SELECT col2 FROM ...) away from SQL. I have never seen it do any good. Really these here are just a bunch of suppression rules to get rid of patients that are dead or don't live here. Nothing special.
Conclusion
So, I have gone through all the 4 main procedures. I believe that all cases of UPDATE and DELETE should be revised and replaced with simple filter queries. I think the PCP provider and the INPATIENT filter can both be executed in the same big query that filters the Interactions. But I too am wounded from my endless fights against the Oracle query optimizer. Still, DELETEs and UPDATES can never be better.
For a quick fix, because we're already writing the Interaction table and a special patient table, we could write those to a temporary transportable tablespace and then ship it off to a second Oracle instance to do the main crunching. That would be faster in any case. However, it would also be a bit more complex to manage. But not worse than what I'm hearing happens now.
I need to come back to the main query, to rewrite it to use my 4-part query with
- Interaction,
- InteractionTally?,
- InteractionMax? and
- InteractionWinner?.
From now on it becomes really tedious work because of all that special stuff in the RMRS data model. I am pleased that my Participation design appears to be well accepted these days. I wish the Acts were also here.
Nevertheless, with a view that exposes all these interactions as acts, we could make a UNION of all of them. It's not impossible.
Once we have the Act table, I would join onto it the global person ids of all participants right away. It seems to me those are needed a lot.
Determining the PCP status of providers ought to be a good thing to do separately of this, because there aren't so many providers that this can't be taking longer than an hour. With a simple provider PCP status table, the filtering of Interactions with non-PCPs is no big deal.
Doing the PCP_PATIENT up front seemed to be a good idea. Just that I would prefer to have replicate entries in there by local institution id. Because the heavy work is to find all those Interactions, so better have the join from Person to Patient pre-computed in the PCP_PATIENT table.
I would do no UPDATES at all ever. And no DELETE ever. Instead CREATE TABLE and DROP.
Original Unchanged Code
create or replace
PACKAGE BODY PCP AS
PROCEDURE RUN_SELECT_PATIENTS(
activity_years IN INT,
report_year IN INT,
report_month IN INT
) IS
/** Patient search function to perform cohort selection before
RUN_PCP_SEARCH (previously part of RUN_PCP_SEARCH)
Parameters:
activity_years - numb of years old activity can be
for qualifying patients in cohort
For example, "5" would indicate that anyone
with activity in the last 5 years
would be included in the patient cohort.
report_year,
report_month - target month for reporting pcp activity
For example, year should be in format like "2007"
while month is number between 1 and 12 (Jan
and Dec)
**/
RPT_DATE DATE :=
add_months(to_date(concat(concat(concat(to_char(report_year), '/'),
to_char(report_month)), '/01'), 'yyyy/mm/dd'), 1);
BEGIN
/** Populate the table with PATIENTs for which PCP should
be gathered
**/
DELETE FROM pcp_patient WHERE evaluation_year = report_year
AND evaluation_month = report_month;
INSERT INTO
PCP_PATIENT(evaluation_year,evaluation_month,institution_id,patient_id,global_person_id,origin_code)
WITH cohort_list AS
( SELECT pat.institution_id
,pat.patient_id
,p.global_person_id
,pat.registration_date
,pat.registration_end_date
,enc.max_admit
,pstat.last_activity_date
,p.origin_code
FROM PERSON p
INNER JOIN PATIENT pat on
pat.institution_id=p.institution_id AND pat.patient_id=p.person_id
LEFT OUTER JOIN rmrs.ENCOUNTER_STATISTICS enc
ON enc.institution_id=pat.institution_id AND
enc.patient_id=pat.patient_id
LEFT OUTER JOIN rmrs.PATIENT_STATISTICS pstat
ON pstat.institution_id=pat.institution_id AND
pstat.patient_id=pat.patient_id
INNER JOIN
(SELECT p1.global_person_id
FROM PERSON p1
WHERE p1.date_of_birth < (RPT_DATE-180) and
p1.date_of_birth > (RPT_DATE-(365*100))
AND p1.human_yn='Y'
AND p1.origin_code='PATIENT'
AND p1.institution_id IN
(1,2,3,7,9,11,14,102,103,108,109,110)
AND p1.name_last IS NOT NULL AND p1.name_first IS
NOT NULL
AND p1.home_address_zip_code_code IN
( SELECT z.zip_code FROM zip_code z WHERE z.zip_code
LIKE '46%' AND z.county_name IN
('BOONE','HAMILTON','HANCOCK','HENDRICKS','JOHNSON',
'MADISON','MARION','MORGAN','PUTNAM'
)
)
MINUS
SELECT p2.global_person_id
FROM PERSON p2
WHERE p2.human_yn='N' OR p2.date_of_death < (RPT_DATE-180)
) persons ON persons.global_person_id=p.global_person_id
WHERE p.institution_id IN
(1,2,3,7,9,11,14,102,103,108,109,110)
)
SELECT report_year, report_month,
co.institution_id,co.patient_id,co.global_person_id,co.origin_code
FROM cohort_list co
WHERE co.global_person_id IN
( SELECT co1.global_person_id
FROM cohort_list co1
WHERE co1.registration_end_date >=
(RPT_DATE-(365*activity_years))
OR co1.last_activity_date >=
(RPT_DATE-(365*activity_years))
OR co1.max_admit >=
(RPT_DATE-(365*activity_years))
OR co1.registration_date >=
(RPT_DATE-(365*activity_years))
);
COMMIT;
DBMS_OUTPUT.PUT_LINE( to_char(sysdate,'dd-mon-yyyy
HH24:MI:SS')||' -- END' );
END;
PROCEDURE RUN_PCP_SEARCH(
activity_years IN INT,
report_year IN INT,
report_month IN INT
) IS
/** Main search function to perform entire PCP data gathering and
calculations (run RUN_SELECT_PATIENTS first)
Parameters:
activity_years - numb of years old activity can be
for qualifying patients in cohort
For example, "5" would indicate that anyone
with activity in the last 5 years
would be included in the patient cohort.
report_year,
report_month - target month for reporting pcp activity
For example, year should be in format like "2007"
while month is number between 1 and 12 (Jan
and Dec)
**/
MSG_CNT NUMBER := 0;
RPT_DATE DATE :=
add_months(to_date(concat(concat(concat(to_char(report_year), '/'),
to_char(report_month)), '/01'), 'yyyy/mm/dd'), 1);
BEGIN
DBMS_OUTPUT.PUT_LINE( 'Begin PARTICIPATION search at ' ||
to_char(sysdate,'YYYY-MON-DD HH24:MM') );
insert into preavis.out(msg, insert_time) values ( 'Begin
PARTICIPATION search at ' || to_char(sysdate,'YYYY-MON-DD HH24:MM'),
sysDate );
COMMIT;
/** Populate intial physician activity log for
patient-physician associations
**/
DELETE FROM PCP_ACTIVITY;
COMMIT;
/** This institution list should be based on an
INSTITUTION_TREE.
We don't want it to be hard-coded, but we also can't use
everything in INSTITUTION.
We don't have permission to use everything.
**/
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 1 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 2 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 3 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 7 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 9 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 11 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 14 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 102 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 103 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 108 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 109 );
PCP.RUN_ACTIVITY_LOAD( activity_years, report_year,
report_month, 110 );
select count(*) into MSG_CNT from PCP_ACTIVITY;
insert into preavis.out(msg, insert_time) values (
'a:'||MSG_CNT, sysDate );
COMMIT;
DBMS_OUTPUT.PUT_LINE( to_char(sysdate,'dd-mon-yyyy
HH24:MI:SS')||' -- Find DEF references for activity physicians' );
insert into preavis.out(msg, insert_time) values (
to_char(sysdate,'dd-mon-yyyy HH24:MI:SS')||' -- Find DEF references for
activity physicians', sysDate );
COMMIT;
/** Look for a match to a DEFining identifier for a
LOCAL_PROVIDER
**/
UPDATE PCP_ACTIVITY pal
SET pal.participant_provider_id=
(SELECT pi.person_id
FROM person_identifier pi
WHERE pal.participant_number=pi.identifier AND
pal.participant_number_sys_id=pi.identifier_sys_id
AND pi.identifier_assignment_code='DEF'
) WHERE pal.participant_provider_id IS NULL
AND pal.participant_global_id IS NULL
AND pal.usable_yn='Y'
AND pal.participant_number IS NOT NULL
AND pal.participant_number_sys_id IS NOT NULL
;
COMMIT;
DBMS_OUTPUT.PUT_LINE( to_char(sysdate,'dd-mon-yyyy
HH24:MI:SS')||' -- Find REF physician references from PCP_ACTITIVTY' );
insert into preavis.out(msg, insert_time) values (
to_char(sysdate,'dd-mon-yyyy HH24:MI:SS')||' -- Find REF physician
references from PCP_ACTITIVTY', sysDate );
COMMIT;
/** Now we look for a match to a REFining identifier for a
LOCAL_PROVIDER.
** Note that we match to an arbitrary MIN(local_provider_id) in
this case.
**/
UPDATE PCP_ACTIVITY pal
SET pal.participant_provider_id=
(SELECT MIN(pi.person_id)
FROM person_identifier pi
WHERE pal.participant_number=pi.identifier AND
pal.participant_number_sys_id=pi.identifier_sys_id
AND pi.identifier_assignment_code='REF'
) WHERE pal.participant_provider_id IS NULL
AND pal.participant_global_id IS NULL
AND pal.usable_yn='Y'
AND pal.participant_number IS NOT NULL
AND pal.participant_number_sys_id IS NOT NULL
;
COMMIT;
/** Search for global_person_id for the physicians in the
activity log
**/
DBMS_OUTPUT.PUT_LINE( to_char(sysdate,'dd-mon-yyyy
HH24:MI:SS')||' -- Find GLOBAL ID for activity physicians' );
insert into preavis.out(msg, insert_time) values (
to_char(sysdate,'dd-mon-yyyy HH24:MI:SS')||' -- Find GLOBAL ID for
activity physicians', sysDate );
COMMIT;
PCP.PHYSICIAN_LOOKUP( report_year, report_month );
COMMIT;
/** Mark the PARTICIPATIONS that occured during an INPATIENT
visit.
**/
DBMS_OUTPUT.PUT_LINE( to_char(sysdate,'dd-mon-yyyy
HH24:MI:SS')||' -- Find PCP_ACTIVITY during INPATIENT admissions' );
insert into preavis.out(msg, insert_time) values (
to_char(sysdate,'dd-mon-yyyy HH24:MI:SS')||' -- Find PCP_ACTIVITY during
INPATIENT admissions', sysDate );
COMMIT;
PCP.INPATIENT_LOOKUP( report_year, report_month );
COMMIT;
/** Mark the PARTICIPATIONS between excluded doctor/patient
combinations.
**/
DBMS_OUTPUT.PUT_LINE( to_char(sysdate,'dd-mon-yyyy
HH24:MI:SS')||' -- Find PCP_ACTIVITY between PCP_EXCLUSION
doctor/patient combinations' );
insert into preavis.out(msg, insert_time) values (
to_char(sysdate,'dd-mon-yyyy HH24:MI:SS')||' -- Find PCP_ACTIVITY
between PCP_EXCLUSION doctor/patient combinations', sysDate );
COMMIT;
update pcp_activity a
set usable_yn='N',not_usable_reason='pcp_exclusion'
where exists
(
select 1
from pcp_exclusion e
inner join person p
on e.patient_global_id=p.global_person_id
where a.participation_time <= e.effective_date
and a.participant_global_id=e.physician_global_id
and a.institution_id=p.institution_id
and a.patient_id=p.person_id
)
and usable_yn='Y';
COMMIT;
/** Calculate PCP for each patient, based on activity.
First, we calculate PCP based on just last 18 months of
activity.
Next, we calculate PCP (for anyone not assinged a PCP in the
18 month pass)
based on the entire 3 years worth of activity in PCP_ACTIVITY.
**/
DBMS_OUTPUT.PUT_LINE( to_char(sysdate,'dd-mon-yyyy
HH24:MI:SS')||' -- Populate PCP_PROVIDER_SUMMARY from PCP_ACTIVITY' );
insert into preavis.out(msg, insert_time) values (
to_char(sysdate,'dd-mon-yyyy HH24:MI:SS')||' -- Populate
PCP_PROVIDER_SUMMARY from PCP_ACTIVITY', sysDate );
COMMIT;
INSERT INTO PCP_PROVIDER_SUMMARY
(evaluation_year,evaluation_month
,global_person_id,participant_global_id,participation_count,participation_day_count
,participation_time_latest,participation_time_earliest,provider_freq_rank)
SELECT report_year, report_month
,ppl.global_person_id,pal.participant_global_id
,count(pal.participation_time) as participation_count
,count(distinct pal.participation_time) as
participation_day_count
,max(pal.participation_time) as participation_time_latest
,min(pal.participation_time) as participation_time_earliest
,RANK() OVER
(PARTITION BY ppl.global_person_id
ORDER BY count(distinct pal.participation_time) DESC
,max(participation_time) DESC
,min(participation_time)
,count(pal.participation_time) DESC
) as rank
FROM PCP_ACTIVITY pal
INNER JOIN pcp_patient ppl ON
ppl.institution_id=pal.institution_id AND ppl.patient_id=pal.patient_id
AND ppl.evaluation_year=report_year AND
ppl.evaluation_month=report_month
WHERE pal.participant_global_id IS NOT NULL AND usable_yn='Y'
AND pal.participation_time >=
(RPT_DATE-((365*activity_years)/2))
AND ppl.origin_code='PATIENT'
GROUP BY ppl.global_person_id,pal.participant_global_id
;
COMMIT;
INSERT INTO PCP_PROVIDER_SUMMARY
(evaluation_year,evaluation_month
,global_person_id,participant_global_id,participation_count,participation_day_count
,participation_time_latest,participation_time_earliest,provider_freq_rank)
SELECT report_year, report_month
,ppl.global_person_id,pal.participant_global_id
,count(pal.participation_time) as participation_count
,count(distinct pal.participation_time) as
participation_day_count
,max(pal.participation_time) as participation_time_latest
,min(pal.participation_time) as participation_time_earliest
,RANK() OVER
(PARTITION BY ppl.global_person_id
ORDER BY count(distinct pal.participation_time) DESC
,max(participation_time) DESC
,min(participation_time)
,count(pal.participation_time) DESC
) as rank
FROM PCP_ACTIVITY pal
INNER JOIN pcp_patient ppl ON
ppl.institution_id=pal.institution_id AND ppl.patient_id=pal.patient_id
AND ppl.evaluation_year=report_year AND
ppl.evaluation_month=report_month
WHERE pal.participant_global_id IS NOT NULL
AND pal.usable_yn='Y'
AND pal.participation_time >= (RPT_DATE-((365*activity_years)))
AND (pal.participation_time >= (RPT_DATE-((30))) and
pal.participation_time < RPT_DATE)
AND ppl.global_person_id NOT IN
(SELECT distinct(pps.global_person_id)
FROM pcp_provider_summary pps
WHERE pps.evaluation_year=report_year AND
pps.evaluation_month=report_month
)
AND ppl.origin_code='PATIENT'
GROUP BY ppl.global_person_id,pal.participant_global_id
;
COMMIT;
select count(*) into MSG_CNT from
PCP_PROVIDER_SUMMARY;
insert into preavis.out(msg, insert_time) values (
'e:'||MSG_CNT, sysDate );
COMMIT;
DBMS_OUTPUT.PUT_LINE( to_char(sysdate,'dd-mon-yyyy
HH24:MI:SS')||' -- Mark PCPs as global physicians with most activity per
patient' );
insert into preavis.out(msg, insert_time) values (
to_char(sysdate,'dd-mon-yyyy HH24:MI:SS')||' -- Mark PCPs as global
physicians with most activity per patient', sysDate );
COMMIT;
/** Find most commonly occuring physician for each patient
**/
UPDATE PCP_PROVIDER_SUMMARY pps
SET pps.pcp_assigned_yn='Y'
WHERE provider_freq_rank=1
AND evaluation_year=report_year AND
evaluation_month=report_month;
COMMIT;
DBMS_OUTPUT.PUT_LINE( to_char(sysdate,'dd-mon-yyyy
HH24:MI:SS')||' -- END' );
insert into preavis.out(msg, insert_time) values (
to_char(sysdate,'dd-mon-yyyy HH24:MI:SS')||' -- END', sysDate );
COMMIT;
END;
PROCEDURE INPATIENT_LOOKUP(
report_year IN INT,
report_month IN INT
) IS
TYPE time_list IS TABLE OF encounter_account.admit_time%TYPE
INDEX BY BINARY_INTEGER;
ADMIT_TIMES time_list;
DISCHARGE_TIMES time_list;
TIME_LIST_CNT NUMBER := 0;
UPD_CNT NUMBER := 0;
I NUMBER := 0;
CNT NUMBER := 0;
LKP_CNT NUMBER := 0;
GLOBAL_ID PERSON.global_person_id%TYPE := 0;
PREV_GLOBAL_ID PERSON.global_person_id%TYPE
:= 0;
ACTIVITY_TIME ENCOUNTER_ACCOUNT.admit_time%TYPE := NULL;
PA_ROWID UROWID;
cursor pcp_act_read is
SELECT
pp.global_person_id,pa.participation_time,pa.rowid
FROM pcp_activity pa
INNER JOIN pcp_patient pp
ON pp.evaluation_year=report_year AND
pp.evaluation_month=report_month
AND pp.institution_id=pa.institution_id AND
pp.patient_id=pa.patient_id
WHERE pa.participant_global_id IS NOT NULL AND
pa.usable_yn='Y'
ORDER BY
pp.global_person_id,pa.participation_time
;
BEGIN
dbms_output.put_line( '-- Starting PCP_INPATIENT procedure
'||to_char(sysdate,'YYYY-MON-DD HH24:MM') );
insert into preavis.out(msg, insert_time) values ( '--
Starting PCP_INPATIENT procedure '||to_char(sysdate,'YYYY-MON-DD
HH24:MM'), sysDate );
COMMIT;
OPEN pcp_act_read;
LOOP
FETCH pcp_act_read INTO GLOBAL_ID, ACTIVITY_TIME, PA_ROWID;
EXIT WHEN pcp_act_read%NOTFOUND;
CNT := CNT + 1;
IF GLOBAL_ID != PREV_GLOBAL_ID
THEN IF TIME_LIST_CNT > 0
THEN TIME_LIST_CNT := 0;
ADMIT_TIMES.DELETE;
DISCHARGE_TIMES.DELETE;
END IF;
SELECT
trunc(e.admit_time),DECODE(e.discharge_time,NULL
,trunc(e.admit_time+1),trunc(e.discharge_time+1))
BULK COLLECT
INTO ADMIT_TIMES, DISCHARGE_TIMES
FROM encounter_account e
INNER JOIN PCP_PATIENT a
ON a.evaluation_year=report_year AND
a.evaluation_month=report_month
AND a.global_person_id = GLOBAL_ID
AND a.institution_id=e.institution_id AND
a.patient_id=e.patient_id
WHERE (e.care_setting_sys_id,e.care_setting_code) IN
(SELECT target_sys_id,target_code FROM concept_link
WHERE source_sys_id=334 and
source_code='INPATIENT_CARE_SETTINGS'
AND type_code='EL' and type_sys_id=4
GROUP BY target_sys_id,target_code
)
AND e.admit_time IS NOT NULL
;
TIME_LIST_CNT := ADMIT_TIMES.COUNT;
PREV_GLOBAL_ID := GLOBAL_ID;
LKP_CNT := LKP_CNT + 1;
END IF;
IF ACTIVITY_TIME IS NULL
THEN UPDATE pcp_activity
SET usable_yn='N',not_usable_reason='time'
WHERE rowid = PA_ROWID;
UPD_CNT := UPD_CNT + 1;
ELSE FOR I IN 1..TIME_LIST_CNT
LOOP
IF ACTIVITY_TIME >= ADMIT_TIMES(I) AND
ACTIVITY_TIME <= DISCHARGE_TIMES(I)
THEN UPDATE pcp_activity
SET usable_yn='N', not_usable_reason='inp'
WHERE rowid = PA_ROWID;
UPD_CNT := UPD_CNT + 1;
EXIT;
END IF;
END LOOP;
END IF;
END LOOP;
COMMIT;
dbms_output.put_line( '-- Ending PCP_INPATIENT procedure
'||to_char(sysdate,'YYYY-MON-DD HH24:MM') );
dbms_output.put_line( ' Read CNT = '||CNT );
dbms_output.put_line( ' Update CNT = '||UPD_CNT );
dbms_output.put_line( 'Visit Lookup CNT = '||LKP_CNT );
insert into preavis.out(msg, insert_time) values ( '--
Ending PCP_INPATIENT procedure '||to_char(sysdate,'YYYY-MON-DD
HH24:MM'), sysDate );
insert into preavis.out(msg, insert_time) values ( '
Read CNT = '||CNT, sysDate );
insert into preavis.out(msg, insert_time) values ( '
Update CNT = '||UPD_CNT, sysDate );
insert into preavis.out(msg, insert_time) values ( 'Visit
Lookup CNT = '||LKP_CNT, sysDate );
COMMIT;
CLOSE pcp_act_read;
END;
PROCEDURE PHYSICIAN_LOOKUP(
report_year IN INT,
report_month IN INT
) IS
UPD_CNT NUMBER := 0;
I NUMBER := 0;
LKP_CNT NUMBER := 0;
TARGET_PROVIDER_ID
LOCAL_PROVIDER.local_provider_id%TYPE := 0;
PREV_PROVIDER_ID LOCAL_PROVIDER.local_provider_id%TYPE
:= 0;
PCP_MIN_FLAG LOCAL_PROVIDER.pcp_yn%TYPE;
PCP_MAX_FLAG LOCAL_PROVIDER.pcp_yn%TYPE;
INST_ID LOCAL_PROVIDER.institution_id%TYPE
:= 0;
PART_TIME PCP_ACTIVITY.participation_time%TYPE;
GLOB_ID PERSON.global_person_id%TYPE := 0;
PREV_GLOB_ID PERSON.global_person_id%TYPE := 0;
TYPE prov_id_list IS TABLE OF
PCP_ACTIVITY.participant_provider_id%TYPE INDEX BY BINARY_INTEGER;
PROV_IDS prov_id_list;
PROV_IDS_CNT NUMBER := 0;
RPT_DATE DATE :=
add_months(to_date(concat(concat(concat(to_char(report_year), '/'),
to_char(report_month)), '/01'), 'yyyy/mm/dd'), 1);
/** This CURSOR is used to read the PCP_ACTIVITY rows
where we have identified a provider/doctor but have
not yet connected it to the global physician occurences
or checked the validity of
the doctor for being a PCP.
**/
CURSOR pcp_activity_read IS
SELECT participant_provider_id,participation_time
FROM pcp_activity
WHERE participant_global_id IS NULL
AND participant_provider_id IS NOT NULL
AND usable_yn='Y'
ORDER BY participant_provider_id,participation_time
FOR UPDATE
;
BEGIN
DBMS_OUTPUT.PUT_LINE( 'Beginning PCP_LOOKUP at ' ||
to_char(sysdate,'YYYY-MON-DD HH24:MM') );
insert into preavis.out(msg, insert_time) values (
'Beginning PCP_LOOKUP at ' || to_char(sysdate,'YYYY-MON-DD HH24:MM'),
sysDate );
COMMIT;
SELECT pa.participant_provider_id
BULK COLLECT
INTO PROV_IDS
FROM pcp_activity pa
WHERE pa.participant_provider_id IS NOT NULL
AND pa.usable_yn='Y'
GROUP BY pa.participant_provider_id
;
PROV_IDS_CNT := PROV_IDS.COUNT;
LKP_CNT := 0;
UPD_CNT := 0;
FOR I IN 1..PROV_IDS_CNT
LOOP
TARGET_PROVIDER_ID := PROV_IDS(I) ;
BEGIN
SELECT institution_id
INTO INST_ID
FROM local_provider
WHERE local_provider_id = TARGET_PROVIDER_ID
;
SELECT p1.global_person_ID, MIN(lp1.pcp_yn),
MAX(lp1.pcp_yn)
INTO GLOB_ID, PCP_MIN_FLAG, PCP_MAX_FLAG
FROM LOCAL_PROVIDER lp
INNER JOIN PERSON per on
lp.institution_id=per.institution_id
AND lp.local_provider_id=per.person_id
INNER JOIN PERSON p1 ON
p1.global_person_id=per.global_person_id
LEFT OUTER JOIN LOCAL_PROVIDER lp1 ON
lp1.institution_id=p1.institution_id
AND lp1.local_provider_id=p1.person_id
WHERE lp.local_provider_id = TARGET_PROVIDER_ID
AND (lp1.retirement_time IS NULL OR
lp1.retirement_time >= (RPT_DATE-365))
GROUP BY p1.global_person_id
;
/** ELSE try to read the global provider
id value **/
EXCEPTION
WHEN NO_DATA_FOUND THEN GLOB_ID := 0;
END;
LKP_CNT := LKP_CNT + 1;
IF GLOB_ID = 0 /** IF no global id
(should not happen) **/
THEN UPDATE pcp_activity
SET usable_yn='N', not_usable_reason='id'
WHERE participant_provider_id =
TARGET_PROVIDER_ID
AND participant_global_id IS NULL;
/** THEN mark activity as unusable
in PCP calc **/
ELSE IF PCP_MAX_FLAG='N' OR (PCP_MAX_FLAG='U' AND
PCP_MIN_FLAG='N')
/** ELSE IF physician cannot be
a PCP **/
THEN UPDATE pcp_activity
SET usable_yn='N',not_usable_reason='spec'
WHERE participant_provider_id =
TARGET_PROVIDER_ID
AND participant_global_id IS NULL;
/** THEN mark activity
unusable for PCP calc **/
ELSE UPDATE pcp_activity
SET participant_global_id = GLOB_ID
WHERE participant_provider_id =
TARGET_PROVIDER_ID
AND participant_global_id IS NULL;
/** ELSE save possible PCP's
global id **/
INSERT INTO pcp_patient
(evaluation_year,evaluation_month,institution_id,patient_id,global_person_id,origin_code)
VALUES
(report_year,report_month,INST_ID,TARGET_PROVIDER_ID,GLOB_ID,'LOCAL_PROVIDER')
;
END IF;
END IF;
UPD_CNT := UPD_CNT + 1;
END LOOP;
DBMS_OUTPUT.PUT_LINE( ' Physician_Lookup completed at:
'||to_char(sysdate,'YYYY-MON-DD HH24:MM') );
DBMS_OUTPUT.PUT_LINE( ' PCP Lookup Count: '||LKP_CNT );
DBMS_OUTPUT.PUT_LINE( ' Update Count: '||UPD_CNT );
insert into preavis.out(msg, insert_time) values ( '
Physician_Lookup completed at: '||to_char(sysdate,'YYYY-MON-DD
HH24:MM'), sysDate );
insert into preavis.out(msg, insert_time) values ( ' PCP
Lookup Count: '||LKP_CNT, sysDate );
insert into preavis.out(msg, insert_time) values ( '
Update Count: '||UPD_CNT, sysDate );
COMMIT;
END;
PROCEDURE RUN_ACTIVITY_LOAD(
activity_years IN INT,
report_year IN INT,
report_month IN INT,
target_institution_id IN INT
) IS
/** Loads data from PARTICIPATION into PCP_ACTIVITY
Parameters:
activity_years - numb of years old activity can be
for qualifying patients in cohort
For example, "5" would indicate that anyone
with activity in the last 5 years
would be included in the patient cohort.
report_year,
report_month - target month for reporting pcp activity
For example, year should be in format like "2007"
while month is number between 1 and 12 (Jan
and Dec)
target_institution_id - institution to copy
**/
RPT_DATE DATE :=
add_months(to_date(concat(concat(concat(to_char(report_year), '/'),
to_char(report_month)), '/01'), 'yyyy/mm/dd'), 1);
BEGIN
DBMS_OUTPUT.PUT_LINE( 'Begin ' || target_institution_id || '
PARTICIPATION search at ' || to_char(sysdate,'YYYY-MON-DD HH24:MM') );
insert into preavis.out(msg, insert_time) values ( 'Begin '
|| target_institution_id || ' PARTICIPATION search at ' ||
to_char(sysdate,'YYYY-MON-DD HH24:MM'), sysDate );
COMMIT;
/** Populate intial physician activity log for
patient-physician associations
**/
INSERT
INTO PCP_ACTIVITY(institution_id,patient_id
,participant_provider_id,participant_number,participant_number_sys_id,participation_time
,usable_yn,act_type_code)
SELECT /*+ full(ppl) full(p) */
ppl.institution_id,ppl.patient_id
,p.participant_provider_id,p.participant_number,p.participant_number_sys_id
,event_time,'Y','E'
FROM PARTICIPATION p
INNER JOIN PCP_PATIENT ppl
ON ppl.institution_id=p.institution_id AND
ppl.patient_id=p.patient_id
AND ppl.evaluation_year=report_year AND
ppl.evaluation_month=report_month
WHERE (p.participation_type_sys_id,p.participation_type_code) IN
(SELECT target_sys_id,target_code FROM concept_link
WHERE source_sys_id=334 and
source_code='PARTICIPATION_TYPES'
AND type_code='EL' and type_sys_id=4
GROUP BY target_sys_id,target_code)
AND ( (p.participant_provider_id IS NOT NULL)
OR
(p.participant_number IS NOT NULL AND
p.participant_number<>'99999' AND p.participant_number_sys_id IS NOT NULL)
)
AND (p.event_time >= (RPT_DATE-(365*activity_years)))
AND p.institution_id=target_institution_id
;
COMMIT;
END;
END PCP;
