Monday, 16 November 2015

ATWD Assignment: Code Example (3)

Mapping the data to MySQL 2: Using XSLT

Another approach to mapping the dataset would be to use XSLT to write out the necessary SQL Schema and Insert statements. We could then feed this "rendering" to MySQL using a PHP built pipe.

XSLT (as a xml dom to other format converter) was created to get this kind of job done. Hence XSLT is a great tool for generating code.

"I'd rather write programs that write programs than write programs" said ap.

We will stick to using XSLT 1.0 & the oXygen editor. If you don't have access to OXygen then use a tool like Kernow.

(On the second semester sister module ATWD2 we will consider XSLT 2.0/3.0, XPATH 3.0 & XQuery in much more detail. For instance, using XSLT to generate PHP code.)

Re-exmine the source file: transport_v1.xml

It's always good to examine the source file/s (once again) before embarking on a XSL stylesheet.

We won't make any structural changes to our input file but notice the following:

  • the by 'car' percent value is derived. A sum of 'as driver' + 'as passenger';
  • hence the last seven percent values for any ward should add up to 100.

(Points worth recalling when we come to develop a Schema.)

The eXtensible Stylesheet Language

XSL is a family of (W3C) recommendations for defining XML document transformation and presentation (formatting).

XSL is really a pattern oriented transformation language. XPATH is used to find the patterns in one or more input documents. When the (user defined) patterns are matched, XSL applies the transformations associated with those patterns and writes out the result.

This makes XSLT (the implementation of XSL) like SQL & Prolog, a declartive programming language. Expressing the logic of the computation rather than the control flow. This is in contrast to imperative programming languages (like PHP) where one or more algorithms are implemented in terms of explicit steps.

To write XSL productively you need some XML aware tools. oXygen and Kernow are great because you can load a source document and try out various XPath expressions interactively. The other reason is that Namespaces play a crucial role when developing with the XML family of technologies. Tools that are namespace aware can help greatly.

Developing the Stylesheet

In the script that follows we've used XSLT 1.0 & XPATH 1.0.

Note how the script takes advantage of the well-ordered data in the input file (i.e. the mode elements are in the same order within each ward element). This order (position) is used to generate keys.

That is, the position of the (parent) ward element in relation to its siblings is worked out using the xpath expression: count(../preceding-sibling::*) + 1 (line 25).

(Question: why is this order not significant for the PHP script (xmltodb.php) and how might we go about not relying on the order in the XSLT code.)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/">
TRUNCATE TABLE `ward`;
TRUNCATE TABLE `tmode`;
TRUNCATE TABLE `ward_tmode`;
<xsl:for-each select="transport/ward">
INSERT INTO ward(`ward_id`, `name`, `lat`, `long`)
VALUES ('<xsl:number value="position()" format="1" />', '<xsl:value-of select="@name"/>', '<xsl:value-of select="@lat"/>', '<xsl:value-of select="@long"/>');
</xsl:for-each>
<xsl:for-each select="transport/ward[1]/mode/@type">
INSERT INTO tmode(`tmode_id`, `mode`)
VALUES ('<xsl:number value="position()" format="1" />', '<xsl:value-of select="."/>');
</xsl:for-each>
<xsl:for-each select="transport/ward">
<xsl:for-each select="./mode">
INSERT INTO ward_tmode(`id_ward`, `id_tmode`, `year`, `percent`)
VALUES ('<xsl:number value="count(../preceding-sibling::*) + 1" format="1" />', '<xsl:number value="position()" format="1" />', '<xsl:value-of select="//transport/@year"/>', '<xsl:value-of select="@percent"/>');
</xsl:for-each>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

View the output file: transport_inserts.sql

Deconstruct the above script for questions, clarifications and criticisms. You can comment in the workshop sessions or ask a question below.


home url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignmentcodeexample3.html

Thursday, 12 November 2015

ATWD1 Assignment: Code Example (2)

Mapping the data to MySQL

It would have been nice to move on to XSLT, XPATH and XQUERY but it was noticed that many of you are still trying (and struggling) to crunch the original CSV file to feed MySQL.

Let's take a step back to the suggested ER model and consider why this may not be a good idea.

atwd_trasport ER

Can you notice some of the design decisions and subtleties in this model?

  • field names are short and unique across the model
  • foreign key names are reflections of primary key names
  • the year is included as part of a composite primary key in the ward_tmode entity - why?

Once the table (SQL) schemas have been mapped to MySQL (use workbench), we will need a program to generate & execute the INSERT statements.

But there's a problem getting at the data given the structure of the original CSV file.

Pulling the (unique) ward details and (unique) transport types from the CSV file isn't too difficult but generating the inserts for the join table "ward_tmode" is much trickier. We will have to generate keys.

Or is it?

We could insist that our first instinct was right and stick with the original CSV file. Processing this isn't impossible after all. We could use fancy PHP or MySQL counters and loops and regular expressions and (possibly) end up writing lots of spaghetti code.

However the effort wouldn't be effective or energy efficient. Remember that according to Larry Wall, one of the three great virtues of a programmer is Laziness. "The quality that makes you go to great effort to reduce overall energy expenditure." Or as we say here in England, "leaving time for the Pub".

We already have a DOM structured document transport_v1.xml holding the data and this (structured) format makes our job of feeding MySQL much easier.

Consider the following script using XPath 1.0 & PDO with prepared statements:

<?php
# important
@date_default_timezone_set("GMT");
# create PDO object and intialise
$db = new PDO('mysql:host=localhost;dbname=bristol_transport;charset=utf8', 'root', '');
$db->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$db->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
# create a simpleXML object with file as input
$xml = simplexml_load_file("transport_v1.xml");
############### POPULATE ward TABLE ##################################
# set record counter
$rec = 0;
# get a array of wards (unique now)
$wards = $xml->xpath("//ward");
# empty the ward table & reset AUTO_INCREMENT
$stmt = $db->query('TRUNCATE TABLE ward');
# populate the ward table setting ward_id = $rec
foreach ($wards as $ward) {
# increment record counter
$rec++;
# insert data
$stmt = $db->prepare("INSERT INTO ward(`ward_id`, `name`, `lat`, `long`) VALUES(?, ?, ?, ?)");
$stmt->execute(array($rec, $ward['name'], $ward['lat'], $ward['long']));
}
############### POPULATE tmode TABLE #################################
#re-set record counter
$rec = 0;
# get a array of transport modes
$modes = $xml->xpath("//ward[1]/mode");
# empty the tmode table & reset AUTO_INCREMENT
$stmt = $db->query('TRUNCATE TABLE tmode');
# populate the tmode table setting ward_id = $rec
foreach ($modes as $mode) {
# increment record counter
$rec++;
# insert data
$stmt = $db->prepare("INSERT INTO tmode(`tmode_id`, `mode`) VALUES(?, ?)");
$stmt->execute(array($rec, $mode['type']));
# build up look-up array for for tmode_id
$tmode[$rec] = $mode['type'];
}
############### POPULATE ward_tmode TABLE ############################
#re-set record counter
$rec = 0;
# get the year value
$year = $xml->xpath("/transport/@year")[0];
# empty the ward_tmode table & reset AUTO_INCREMENT
$stmt = $db->query('TRUNCATE TABLE ward_tmode');
# populate the ward_tmode table setting id_ward = $rec &
# using the look-up array $tmode to find id_tmode
foreach ($wards as $ward) {
# increment record counter
$rec++;
# get a array of all mode entities in the current ward
$ward_modes = $xml->xpath("//ward[" . $rec . "]/mode");
# populate table ward_tmode
foreach ($ward_modes as $wm) {
# look up key value for id_tmode
$key = array_search((string) $wm['type'], $tmode);
# insert data
$stmt = $db->prepare("INSERT INTO ward_tmode(`id_ward`, `id_tmode`, `year`, `percent`) VALUES(?, ?, ?, ?)");
$stmt->execute(array($rec, $key, $year, $wm['percent']));
}
}
# close db connection
$db=null;
# all done
?>
view raw xml_to_db.php hosted with ❤ by GitHub

Deconstruct the script for questions, clarifications and criticisms. We will allocate 10 minutes at the begining of Monday's lecture for this purpose.


home url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignmentcodeexample2.html

Tuesday, 10 November 2015

ATWD1 Assignment: Code Example (1)

The original input file is a simple CSV file with 7 fields and 316 records. The first record is a header row.

We can process this CSV file and generate a simple XML file very easily.

<?php
$i = 0;
$r = 0;
$columns = 0;
$file = "Quality_of_Life_-_Travel_to_Work_2013.csv";
$xml = '';
if (($handle = fopen($file, "r")) !== FALSE) {
$xml .= '<transport>';
while ($data = fgetcsv($handle, 2048, ",")) {
$i++;
if ($i == 1) {
$columns = count($data);
$header = $data;
} else {
$r++;
$xml .= '<record id="' . $r . '">';
# iterate over columns
for ($c = 0; $c < $columns; $c++) {
$xml .= "<$header[$c]>" . $data[$c] . "</$header[$c]>";
}
$xml .= '</record>';
}
}
$xml .= '</transport>';
}
fclose($handle);
# save the file as a DOM document
$doc = new DOMDocument();
$doc->loadXML($xml);
$doc->save("transport.xml");
# print out xml
echo header("Content-type: text/xml");
echo $xml;
?>
view raw csv_to_xml.php hosted with ❤ by GitHub

Run

The output file isn't optimal for processing and also contains some repeating (unecessary) data. The location node repeats data already in the lat./long. nodes and the year value is repeated too.

However, we've taken a big leap. The data is now in a DOM compatible format!

For readabilty & easier processing purposes we would prefer a structure like the following:

<?xml version="1.0" encoding="UTF-8"?>
<transport year='2013'>
<ward lat='51.46619' long='-2.583456' name='Ashley'>
<mode type='car(as driver)' percent='25.3' />
<mode type='car(as passenger)' percent='3.7' />
<mode type='another' percent='0' />
<mode type='bus' percent='10.9' />
<mode type='car' percent='29' />
<mode type='cycle' percent='20.4' />
<mode type='moped/motorbike' percent='0.6' />
<mode type='train' percent='1.4' />
<mode type='walking' percent='37.7' />
</ward>
<!-- more wards -->
</transport>

For this transformation we can use PHP's built-in DOM facade library simpleXML or use XSLT (built for such xml-to-xml transformations).

Unfortunatly current PHP builds don't provide native support for XPath 2.0/3.0 so we have to do with XPath 1.0.

XPath 2.0/3.0 provide very powerful built in functions like distinct-values() so an expression like distinct-values(//record/Ward) would generate an array with unique wards.

(We can get around this by installing and using an external library like Saxon but lets use what we have - the current UWE/CEMS setup and XPath 1.0.)

Let's first consider a PHP only solution using the simpleXML and XMLWriter modules & XPath 1.0.

<?php
# important
@date_default_timezone_set("GMT");
# create a simpleXML object with file as input
$xml = simplexml_load_file("transport.xml");
# generate a array holding unique wards
$wards = $xml->xpath("//Ward[not(preceding-sibling::Ward)]");
$ward = array_unique($wards);
# create and intialise new XMLWriter object
$writer = new XMLWriter();
$writer->openURI('transport_v1.xml');
$writer->startDocument("1.0");
$writer->setIndent(4);
# get the year value
$year = (string) $xml->xpath("//Year")[0];
# start writing the new xml
$writer->startElement('transport');
$writer->writeAttribute('year', $year);
# walk the $ward array & process data
foreach ($ward as $w) {
# get an array of records matching current ward
$records = $xml->xpath("//*[Ward='$w']");
# get the geo-code
$lat = (string) $records[0]->Approx_lat;
$long = (string) $records[0]->Approx_long;
# start the ward element
$writer->startElement('ward');
$writer->writeAttribute('lat', $lat);
$writer->writeAttribute('long', $long);
$writer->writeAttribute('name', $w);
# walk the records array & process data
foreach ($records as $record) {
# start the mode element
$writer->startElement('mode');
switch (true) {
case strpos((string) $record->Question, "as driver"):
$writer->writeAttribute('type', 'car (as driver)');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "as passenger"):
$writer->writeAttribute('type', 'car (as passenger)');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "another"):
$writer->writeAttribute('type', 'another');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "bus"):
$writer->writeAttribute('type', 'bus');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "work by car"):
$writer->writeAttribute('type', 'car');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "cycle"):
$writer->writeAttribute('type', 'cycle');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "moped/motorbike"):
$writer->writeAttribute('type', 'moped/motorbike');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "train"):
$writer->writeAttribute('type', 'train');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "walking"):
$writer->writeAttribute('type', 'walking');
$writer->writeAttribute('percent', $record->Value);
break;
}
$writer->endElement();
}
$writer->endElement();
}
# end the document and flush to file
$writer->endDocument();
$writer->flush();
?>
view raw tdom_to_xml.php hosted with ❤ by GitHub

You can view the generated file transport_v1.xml

We'll next use XSLT 1.0 to do the same task and take a look at XSLT 2.0/3.0.


url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignment_code_example1.html