The original input file is a simple CSV file with 7 fields and 316 records. The first record is a header row.
We can process this CSV file and generate a simple XML file very easily.
<?php | |
$i = 0; | |
$r = 0; | |
$columns = 0; | |
$file = "Quality_of_Life_-_Travel_to_Work_2013.csv"; | |
$xml = ''; | |
if (($handle = fopen($file, "r")) !== FALSE) { | |
$xml .= '<transport>'; | |
while ($data = fgetcsv($handle, 2048, ",")) { | |
$i++; | |
if ($i == 1) { | |
$columns = count($data); | |
$header = $data; | |
} else { | |
$r++; | |
$xml .= '<record id="' . $r . '">'; | |
# iterate over columns | |
for ($c = 0; $c < $columns; $c++) { | |
$xml .= "<$header[$c]>" . $data[$c] . "</$header[$c]>"; | |
} | |
$xml .= '</record>'; | |
} | |
} | |
$xml .= '</transport>'; | |
} | |
fclose($handle); | |
# save the file as a DOM document | |
$doc = new DOMDocument(); | |
$doc->loadXML($xml); | |
$doc->save("transport.xml"); | |
# print out xml | |
echo header("Content-type: text/xml"); | |
echo $xml; | |
?> |
The output file isn't optimal for processing and also contains some repeating (unecessary) data. The location node repeats data already in the lat./long. nodes and the year value is repeated too.
However, we've taken a big leap. The data is now in a DOM compatible format!
For readabilty & easier processing purposes we would prefer a structure like the following:
<?xml version="1.0" encoding="UTF-8"?> | |
<transport year='2013'> | |
<ward lat='51.46619' long='-2.583456' name='Ashley'> | |
<mode type='car(as driver)' percent='25.3' /> | |
<mode type='car(as passenger)' percent='3.7' /> | |
<mode type='another' percent='0' /> | |
<mode type='bus' percent='10.9' /> | |
<mode type='car' percent='29' /> | |
<mode type='cycle' percent='20.4' /> | |
<mode type='moped/motorbike' percent='0.6' /> | |
<mode type='train' percent='1.4' /> | |
<mode type='walking' percent='37.7' /> | |
</ward> | |
<!-- more wards --> | |
</transport> |
For this transformation we can use PHP's built-in DOM facade library simpleXML or use XSLT (built for such xml-to-xml transformations).
Unfortunatly current PHP builds don't provide native support for XPath 2.0/3.0 so we have to do with XPath 1.0.
XPath 2.0/3.0 provide very powerful built in functions like distinct-values() so an expression like distinct-values(//record/Ward) would generate an array with unique wards.
(We can get around this by installing and using an external library like Saxon but lets use what we have - the current UWE/CEMS setup and XPath 1.0.)
Let's first consider a PHP only solution using the simpleXML and XMLWriter modules & XPath 1.0.
<?php | |
# important | |
@date_default_timezone_set("GMT"); | |
# create a simpleXML object with file as input | |
$xml = simplexml_load_file("transport.xml"); | |
# generate a array holding unique wards | |
$wards = $xml->xpath("//Ward[not(preceding-sibling::Ward)]"); | |
$ward = array_unique($wards); | |
# create and intialise new XMLWriter object | |
$writer = new XMLWriter(); | |
$writer->openURI('transport_v1.xml'); | |
$writer->startDocument("1.0"); | |
$writer->setIndent(4); | |
# get the year value | |
$year = (string) $xml->xpath("//Year")[0]; | |
# start writing the new xml | |
$writer->startElement('transport'); | |
$writer->writeAttribute('year', $year); | |
# walk the $ward array & process data | |
foreach ($ward as $w) { | |
# get an array of records matching current ward | |
$records = $xml->xpath("//*[Ward='$w']"); | |
# get the geo-code | |
$lat = (string) $records[0]->Approx_lat; | |
$long = (string) $records[0]->Approx_long; | |
# start the ward element | |
$writer->startElement('ward'); | |
$writer->writeAttribute('lat', $lat); | |
$writer->writeAttribute('long', $long); | |
$writer->writeAttribute('name', $w); | |
# walk the records array & process data | |
foreach ($records as $record) { | |
# start the mode element | |
$writer->startElement('mode'); | |
switch (true) { | |
case strpos((string) $record->Question, "as driver"): | |
$writer->writeAttribute('type', 'car (as driver)'); | |
$writer->writeAttribute('percent', $record->Value); | |
break; | |
case strpos((string) $record->Question, "as passenger"): | |
$writer->writeAttribute('type', 'car (as passenger)'); | |
$writer->writeAttribute('percent', $record->Value); | |
break; | |
case strpos((string) $record->Question, "another"): | |
$writer->writeAttribute('type', 'another'); | |
$writer->writeAttribute('percent', $record->Value); | |
break; | |
case strpos((string) $record->Question, "bus"): | |
$writer->writeAttribute('type', 'bus'); | |
$writer->writeAttribute('percent', $record->Value); | |
break; | |
case strpos((string) $record->Question, "work by car"): | |
$writer->writeAttribute('type', 'car'); | |
$writer->writeAttribute('percent', $record->Value); | |
break; | |
case strpos((string) $record->Question, "cycle"): | |
$writer->writeAttribute('type', 'cycle'); | |
$writer->writeAttribute('percent', $record->Value); | |
break; | |
case strpos((string) $record->Question, "moped/motorbike"): | |
$writer->writeAttribute('type', 'moped/motorbike'); | |
$writer->writeAttribute('percent', $record->Value); | |
break; | |
case strpos((string) $record->Question, "train"): | |
$writer->writeAttribute('type', 'train'); | |
$writer->writeAttribute('percent', $record->Value); | |
break; | |
case strpos((string) $record->Question, "walking"): | |
$writer->writeAttribute('type', 'walking'); | |
$writer->writeAttribute('percent', $record->Value); | |
break; | |
} | |
$writer->endElement(); | |
} | |
$writer->endElement(); | |
} | |
# end the document and flush to file | |
$writer->endDocument(); | |
$writer->flush(); | |
?> |
You can view the generated file transport_v1.xml
We'll next use XSLT 1.0 to do the same task and take a look at XSLT 2.0/3.0.
No comments:
Post a Comment