Tuesday, March 26, 2019

Code Kata - Running Analysis

A Code Kata is a quick programming challenge where you try and solve a problem.  The goal is to become a better programmer by taking one quick challenges that are different than the stuff you code everyday.  You take a problem and you only get 30 minutes to work on it.  The idea is that you activate your brain while learning new skills.  It is good to work on a code kata first thing in the day then start your normal work (I never do, but that is the idea).  This code kata is taking my heart rate history and comparing two runs to see just how bad my heart rate has become.  This involved building a POSTGRES database, parsing and saving TCX files to the tables, pulling in the data and plotting in Matplotlib using Pandas.

I was running consistent exercises up to early February.  On Feb 11th I did a heart rate test run.  On the treadmill I ran 6.2 mph for 3 miles.  The idea is that if I run the same speed and measure my heart rate at a different time  in the year I can see if my aerobic fitness has improved or not.  Well, then I didn't run for a month and a half.  Not trying to kill myself, but life happens.  So I took the opportunity to test my heart health again and see the differences.  All for science.

Below you can see the plot of the two runs.  The cross plot shows seconds versus heart rate (bpm).    Run #1 was when I was training consistently and Run #2 was after a month and a half off.  The differences averaged about ~8 bpm difference when running at the same speed.  Yeah, I stopped 1/2 mile early on Run 2.  I had collected enough data.  :)



If you keep reading you will get some code samples.  I will eventually publish my code on Github after I polish it up a bit first.  There are just samples to how I performed different code snippets.

The same treadmill and heart rate monitor were used on both runs; Polar H10 which connects to my iPhone via bluetooth.  Sessions are uploaded via the Polar Beat app and I can download a TCX file with details; Time, heart rate, latitude and longitude.  Polar's website and app are sufficient and give some statistics.  The problem is that you can't compare two sessions against each other.

When I created the database I saved the table building queries inside of a method so I quickly quickly regenerate them in a new database.  I have the details stored in an INTERVAL table and the general data stored in a SESSION table.



Parsing the TCX file is the same as parsing an XML file.  I could have used python's XML reader library but I split out the pieces myself and saved them to a few arrays before storing them to the database.  When I query for a run I get a Pandas DataFrame.

TcxFile Object:



Making the plots required calculating the number of seconds, adding some best fit lines and calculating the differences in heart rate between the two runs.

First I pull in the data into a DataFrame, split the sessions into two different DataFrames, and calculate seconds.




There is probably an easier way to calculate the best fit lines, I just did it quick and dirty using Numpy and running a 3rd degree polynomial.  I did filter the data to start curve fitting after the first 60 seconds.  



To calculate the differences in heart rates I merged the DataFrames back together using seconds as the joining column and made a quick calc.


It is only a couple lines of code in Matplotlib to show all the data.


What IDE uses these crazy colors?  PyCharm from Jetbrains.  I've really enjoyed using Visual Studio Code but I have one issue where VS Code doesn't always find unit tests in python.

Tuesday, February 19, 2019

Flint - Connected to Geographix


Glacier Geo checklist:
  • Write a python based petrophysical software - CHECK 
  • Use software in a bunch of basins - CHECK
  • Connect to Geographix directly without using any LAS files - CHECKITY CHECK YOURSELF #beforeya #reckyaself
Every blue moon I write about Flint (a software I wrote for interpreting logs).  More often than that I am adding tweaks and features for doing things more simply than commercial software.  This last quarter I added a database connection to Geographix.  This means I can view any digital curves and tops in Geographix without exporting an LAS first.  Yes, and tops.  Flint could become a commercial Petro tool that also has the middleware to query other oil and gas databases.

Pros for Flint missing in Prizm:
  • Petro model that can perform loops
  • Interactive curve editing (fix all those spikes on RHOB)
  • Depth shifting individual curves
  • Interactive curve splicing (no Excel necessary)
  • Runs on Windows, Mac, and Linux
  • Fast
All non-technical people are allowed to stop reading now.  The rest gets into the weeds.  So all you data and code hunters continue!

Geographix is a popular geologic interpretation suite used in the oil and gas industry.  Large and small, independent companies like Chesapeake use it because there are tools for mapping, raster interpretation, and keeping track of wells and meta data.  The programs are built on a Sybase relational database.  Why is that important?  Because you can query anything in a database.  Below I will show some intro examples.

I added a little option to connect to Geographix.  In about 5 seconds I can have a well open. Or I can completely switch to a different project in 5 seconds.  Prizm can take about a minute to open before you can query for a well.  It can take a minute to switch from one GeoGraphix project to another.  



Now how do I query for curves?  Simple, little SQL query returns all the curve data I need based on a UWI and curve set name.  Here is an example for an imported curve set (any LAS loaded into GGX go into Imported curve sets).  

SELECT DBA.GX_WELL_CURVE_VALUES.WELLID, DBA.GX_WELL_CURVE_VALUES.CURVESET, DBA.GX_WELL_CURVE_VALUES.CURVENAME,
DBA.GX_WELL_CURVE_VALUES.VERSION, DBA.GX_WELL_CURVE_VALUES.CURVE_VALUES, DBA.GX_WELL_CURVE.CMD_TYPE,
DBA.GX_WELL_CURVE.CURVE_UOM, DBA.GX_WELL_CURVE.DATE_MODIFIED, DBA.GX_WELL_CURVE.DESCRIPTION, DBA.GX_WELL_CURVE.TOOL_TYPE,
DBA.GX_WELL_CURVE.REMARK, DBA.GX_WELL_CURVESET.TOPDEPTH, DBA.GX_WELL_CURVESET.BASEDEPTH, DBA.GX_WELL_CURVESET.DEPTHINCR,
DBA.GX_WELL_CURVESET.LOG_JOB, DBA.GX_WELL_CURVESET.LOG_TRIP, DBA.GX_WELL_CURVESET.SOURCE_FILE,
DBA.GX_WELL_CURVESET.TYPE, DBA.GX_WELL_CURVESET.FIELDDATA
FROM
DBA.GX_WELL_CURVE_VALUES, DBA.GX_WELL_CURVE, DBA.GX_WELL_CURVESET
WHERE(GX_WELL_CURVE.WELLID = '%s') AND(GX_WELL_CURVE.CURVESET = '%s') AND
(GX_WELL_CURVE_VALUES.WELLID = GX_WELL_CURVE.WELLID)
AND(GX_WELL_CURVE.WELLID = GX_WELL_CURVESET.WELLID) AND
(GX_WELL_CURVE.CURVENAME = GX_WELL_CURVE_VALUES.CURVENAME)
AND(GX_WELL_CURVE.CURVESET = GX_WELL_CURVESET.CURVESET) AND
(GX_WELL_CURVESET.CURVESET = GX_WELL_CURVE_VALUES.CURVESET)
ORDER
BY
GX_WELL_CURVE.CURVESET, GX_WELL_CURVE.CURVENAME, GX_WELL_CURVE.WELLID, GX_WELL_CURVE_VALUES.CURVESET,
GX_WELL_CURVE_VALUES.CURVENAME

Curve data is not saved as a value per row, rather an entire curve is saved into a single cell as a BLOB.  If you pass that array of bytes to a simple little method you can get back an array of floats.  Here is a little python snippet.  I get rid of LAS nulls (-999.25) and any chance of Not A Number (nan).



What is the point of all of this?  Keep coding all you people out there!  Small little snippets add up to big programs.  A problem isn't solved with one fell swoop.  It is usually thousand of precision slices.  The beautiful marble sculptures from Italy were not made with a sledgehammer.

Cheers,
Jon

#GeoGraphix #GlacierGeosciences #Petrophysics #Python #SQL #Dev