Wednesday, March 29, 2017

Data Analytics Summit III at Harrisburg University of Science and Technology

Harrisburg University of Science and Technology (Harrisburg, Pennsylvania) has just finished hosting Data Analytics Summit III. This is a multi-day event featuring a mix of presenters from the private sector, the government/government-related businesses and academia which spans research, practice and more visionary ("big picture") topics. The theme was “Analytics Applied:  Case Studies, Measuring Impact, and Communicating Results".

Regrettably, I was unable to attend this time because I was traveling for business, but I was at Data Analytics Summit II, which was held in December of 2015. If you haven't been: Harrisburg University of Science and Technology does a nice job hosting this event. Additionally, (so far) the Data Analytics Summit has been free of charge, so there is the prospect of free food if you are a starving grad student.

The university has generously provided links to video of the presentations from the most recent Summit:

http://analyticssummit.harrisburgu.edu/


Video links for the previous Summit, whose theme was unstructured data can be found at the bottom of my article, "Unstructured Data Mining - A Primer" (Apr-11-2016) over on icrunchdata:

https://icrunchdata.com/unstructured-data-mining-primer/

I encourage readers to explore this free resource.


Friday, March 17, 2017

Geographic Distances: A Quick Trip Around the Great Circle

Recently, I wanted to calculate the distance between locations on the Earth. Finding a handy solution, I thought readers might be interested. In my situation, location data included ZIP codes (American postal codes). Also available to me is a look-up table of the latitude and longitude of the geometric centroid of each ZIP code. Since the areas identified by ZIP codes are usually geographical small, and making the "close enough" assumption that this planet is perfectly spherical, trigonometry will allow distance calculations which are, for most purposes, precise enough.

Given the latitude and longitude of cities 'A' and 'B', the following line of MATLAB code will calculate the distance between the two coordinates "as the crow flies" (technically, the "great circle distance"), in kilometers:

DistanceKilometers = round(111.12 * acosd(cosd(LongA - LongB) * cosd(LatA) * cosd(LatB) + sind(LatA) * sind(LatB)));

Note that latitude and longitude are expected as decimal degrees. If your data is in degrees/minutes/seconds, a quick conversion will be needed.

I've checked this formula against a second source and quickly verified it using a few pairs of cities:


% 'A' = New York
% 'B' = Atlanta
% Random on-line reference: 1202km
LatA = 40.664274;
LongA =  -73.9385;
LatB = 33.762909;
LongB = -84.422675;
DistanceKilometers = round(111.12 * acosd(cosd(LongA - LongB) * cosd(LatA) * cosd(LatB) + sind(LatA) * sind(LatB)))

DistanceKilometers =

        1202


% 'A' = New York
% 'B' = Los Angeles
% Random on-line reference: 3940km (less than 0.5% difference)<0 .5="" br="" difference="">
LatA = 40.664274;
LongA =  -73.9385;
LatB = 34.019394;
LongB = -118.410825;
DistanceKilometers = round(111.12 * acosd(cosd(LongA - LongB) * cosd(LatA) * cosd(LatB) + sind(LatA) * sind(LatB)))

DistanceKilometers =

        3955



 References:

"How Far is Berlin?", by Alan Zeichick, published in the Sep-1991 issue of "Computer Language" magazine. Note that Zeichick credits as his source an HP-27 scientific calculator, from which  he reverse-engineered the formula above.

"Trigonometry DeMYSTiFieD, 2nd edition", by Stan Gibilisco (ISBN: 978-0-07-178024-7)