Monday 16 November 2015

ATWD Assignment: Code Example (3)

Mapping the data to MySQL 2: Using XSLT

Another approach to mapping the dataset would be to use XSLT to write out the necessary SQL Schema and Insert statements. We could then feed this "rendering" to MySQL using a PHP built pipe.

XSLT (as a xml dom to other format converter) was created to get this kind of job done. Hence XSLT is a great tool for generating code.

"I'd rather write programs that write programs than write programs" said ap.

We will stick to using XSLT 1.0 & the oXygen editor. If you don't have access to OXygen then use a tool like Kernow.

(On the second semester sister module ATWD2 we will consider XSLT 2.0/3.0, XPATH 3.0 & XQuery in much more detail. For instance, using XSLT to generate PHP code.)

Re-exmine the source file: transport_v1.xml

It's always good to examine the source file/s (once again) before embarking on a XSL stylesheet.

We won't make any structural changes to our input file but notice the following:

  • the by 'car' percent value is derived. A sum of 'as driver' + 'as passenger';
  • hence the last seven percent values for any ward should add up to 100.

(Points worth recalling when we come to develop a Schema.)

The eXtensible Stylesheet Language

XSL is a family of (W3C) recommendations for defining XML document transformation and presentation (formatting).

XSL is really a pattern oriented transformation language. XPATH is used to find the patterns in one or more input documents. When the (user defined) patterns are matched, XSL applies the transformations associated with those patterns and writes out the result.

This makes XSLT (the implementation of XSL) like SQL & Prolog, a declartive programming language. Expressing the logic of the computation rather than the control flow. This is in contrast to imperative programming languages (like PHP) where one or more algorithms are implemented in terms of explicit steps.

To write XSL productively you need some XML aware tools. oXygen and Kernow are great because you can load a source document and try out various XPath expressions interactively. The other reason is that Namespaces play a crucial role when developing with the XML family of technologies. Tools that are namespace aware can help greatly.

Developing the Stylesheet

In the script that follows we've used XSLT 1.0 & XPATH 1.0.

Note how the script takes advantage of the well-ordered data in the input file (i.e. the mode elements are in the same order within each ward element). This order (position) is used to generate keys.

That is, the position of the (parent) ward element in relation to its siblings is worked out using the xpath expression: count(../preceding-sibling::*) + 1 (line 25).

(Question: why is this order not significant for the PHP script (xmltodb.php) and how might we go about not relying on the order in the XSLT code.)

View the output file: transport_inserts.sql

Deconstruct the above script for questions, clarifications and criticisms. You can comment in the workshop sessions or ask a question below.


home url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignmentcodeexample3.html

Thursday 12 November 2015

ATWD1 Assignment: Code Example (2)

Mapping the data to MySQL

It would have been nice to move on to XSLT, XPATH and XQUERY but it was noticed that many of you are still trying (and struggling) to crunch the original CSV file to feed MySQL.

Let's take a step back to the suggested ER model and consider why this may not be a good idea.

atwd_trasport ER

Can you notice some of the design decisions and subtleties in this model?

  • field names are short and unique across the model
  • foreign key names are reflections of primary key names
  • the year is included as part of a composite primary key in the ward_tmode entity - why?

Once the table (SQL) schemas have been mapped to MySQL (use workbench), we will need a program to generate & execute the INSERT statements.

But there's a problem getting at the data given the structure of the original CSV file.

Pulling the (unique) ward details and (unique) transport types from the CSV file isn't too difficult but generating the inserts for the join table "ward_tmode" is much trickier. We will have to generate keys.

Or is it?

We could insist that our first instinct was right and stick with the original CSV file. Processing this isn't impossible after all. We could use fancy PHP or MySQL counters and loops and regular expressions and (possibly) end up writing lots of spaghetti code.

However the effort wouldn't be effective or energy efficient. Remember that according to Larry Wall, one of the three great virtues of a programmer is Laziness. "The quality that makes you go to great effort to reduce overall energy expenditure." Or as we say here in England, "leaving time for the Pub".

We already have a DOM structured document transport_v1.xml holding the data and this (structured) format makes our job of feeding MySQL much easier.

Consider the following script using XPath 1.0 & PDO with prepared statements:

Deconstruct the script for questions, clarifications and criticisms. We will allocate 10 minutes at the begining of Monday's lecture for this purpose.


home url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignmentcodeexample2.html

Tuesday 10 November 2015

ATWD1 Assignment: Code Example (1)

The original input file is a simple CSV file with 7 fields and 316 records. The first record is a header row.

We can process this CSV file and generate a simple XML file very easily.

Run

The output file isn't optimal for processing and also contains some repeating (unecessary) data. The location node repeats data already in the lat./long. nodes and the year value is repeated too.

However, we've taken a big leap. The data is now in a DOM compatible format!

For readabilty & easier processing purposes we would prefer a structure like the following:

For this transformation we can use PHP's built-in DOM facade library simpleXML or use XSLT (built for such xml-to-xml transformations).

Unfortunatly current PHP builds don't provide native support for XPath 2.0/3.0 so we have to do with XPath 1.0.

XPath 2.0/3.0 provide very powerful built in functions like distinct-values() so an expression like distinct-values(//record/Ward) would generate an array with unique wards.

(We can get around this by installing and using an external library like Saxon but lets use what we have - the current UWE/CEMS setup and XPath 1.0.)

Let's first consider a PHP only solution using the simpleXML and XMLWriter modules & XPath 1.0.

You can view the generated file transport_v1.xml

We'll next use XSLT 1.0 to do the same task and take a look at XSLT 2.0/3.0.


url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignment_code_example1.html