Tuesday, 10 November 2015

ATWD1 Assignment: Code Example (1)

The original input file is a simple CSV file with 7 fields and 316 records. The first record is a header row.

We can process this CSV file and generate a simple XML file very easily.

<?php
$i = 0;
$r = 0;
$columns = 0;
$file = "Quality_of_Life_-_Travel_to_Work_2013.csv";
$xml = '';
if (($handle = fopen($file, "r")) !== FALSE) {
$xml .= '<transport>';
while ($data = fgetcsv($handle, 2048, ",")) {
$i++;
if ($i == 1) {
$columns = count($data);
$header = $data;
} else {
$r++;
$xml .= '<record id="' . $r . '">';
# iterate over columns
for ($c = 0; $c < $columns; $c++) {
$xml .= "<$header[$c]>" . $data[$c] . "</$header[$c]>";
}
$xml .= '</record>';
}
}
$xml .= '</transport>';
}
fclose($handle);
# save the file as a DOM document
$doc = new DOMDocument();
$doc->loadXML($xml);
$doc->save("transport.xml");
# print out xml
echo header("Content-type: text/xml");
echo $xml;
?>
view raw csv_to_xml.php hosted with ❤ by GitHub

Run

The output file isn't optimal for processing and also contains some repeating (unecessary) data. The location node repeats data already in the lat./long. nodes and the year value is repeated too.

However, we've taken a big leap. The data is now in a DOM compatible format!

For readabilty & easier processing purposes we would prefer a structure like the following:

<?xml version="1.0" encoding="UTF-8"?>
<transport year='2013'>
<ward lat='51.46619' long='-2.583456' name='Ashley'>
<mode type='car(as driver)' percent='25.3' />
<mode type='car(as passenger)' percent='3.7' />
<mode type='another' percent='0' />
<mode type='bus' percent='10.9' />
<mode type='car' percent='29' />
<mode type='cycle' percent='20.4' />
<mode type='moped/motorbike' percent='0.6' />
<mode type='train' percent='1.4' />
<mode type='walking' percent='37.7' />
</ward>
<!-- more wards -->
</transport>

For this transformation we can use PHP's built-in DOM facade library simpleXML or use XSLT (built for such xml-to-xml transformations).

Unfortunatly current PHP builds don't provide native support for XPath 2.0/3.0 so we have to do with XPath 1.0.

XPath 2.0/3.0 provide very powerful built in functions like distinct-values() so an expression like distinct-values(//record/Ward) would generate an array with unique wards.

(We can get around this by installing and using an external library like Saxon but lets use what we have - the current UWE/CEMS setup and XPath 1.0.)

Let's first consider a PHP only solution using the simpleXML and XMLWriter modules & XPath 1.0.

<?php
# important
@date_default_timezone_set("GMT");
# create a simpleXML object with file as input
$xml = simplexml_load_file("transport.xml");
# generate a array holding unique wards
$wards = $xml->xpath("//Ward[not(preceding-sibling::Ward)]");
$ward = array_unique($wards);
# create and intialise new XMLWriter object
$writer = new XMLWriter();
$writer->openURI('transport_v1.xml');
$writer->startDocument("1.0");
$writer->setIndent(4);
# get the year value
$year = (string) $xml->xpath("//Year")[0];
# start writing the new xml
$writer->startElement('transport');
$writer->writeAttribute('year', $year);
# walk the $ward array & process data
foreach ($ward as $w) {
# get an array of records matching current ward
$records = $xml->xpath("//*[Ward='$w']");
# get the geo-code
$lat = (string) $records[0]->Approx_lat;
$long = (string) $records[0]->Approx_long;
# start the ward element
$writer->startElement('ward');
$writer->writeAttribute('lat', $lat);
$writer->writeAttribute('long', $long);
$writer->writeAttribute('name', $w);
# walk the records array & process data
foreach ($records as $record) {
# start the mode element
$writer->startElement('mode');
switch (true) {
case strpos((string) $record->Question, "as driver"):
$writer->writeAttribute('type', 'car (as driver)');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "as passenger"):
$writer->writeAttribute('type', 'car (as passenger)');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "another"):
$writer->writeAttribute('type', 'another');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "bus"):
$writer->writeAttribute('type', 'bus');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "work by car"):
$writer->writeAttribute('type', 'car');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "cycle"):
$writer->writeAttribute('type', 'cycle');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "moped/motorbike"):
$writer->writeAttribute('type', 'moped/motorbike');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "train"):
$writer->writeAttribute('type', 'train');
$writer->writeAttribute('percent', $record->Value);
break;
case strpos((string) $record->Question, "walking"):
$writer->writeAttribute('type', 'walking');
$writer->writeAttribute('percent', $record->Value);
break;
}
$writer->endElement();
}
$writer->endElement();
}
# end the document and flush to file
$writer->endDocument();
$writer->flush();
?>
view raw tdom_to_xml.php hosted with ❤ by GitHub

You can view the generated file transport_v1.xml

We'll next use XSLT 1.0 to do the same task and take a look at XSLT 2.0/3.0.


url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignment_code_example1.html

No comments:

Post a Comment