Monday 16 November 2015

ATWD Assignment: Code Example (3)

Mapping the data to MySQL 2: Using XSLT

Another approach to mapping the dataset would be to use XSLT to write out the necessary SQL Schema and Insert statements. We could then feed this "rendering" to MySQL using a PHP built pipe.

XSLT (as a xml dom to other format converter) was created to get this kind of job done. Hence XSLT is a great tool for generating code.

"I'd rather write programs that write programs than write programs" said ap.

We will stick to using XSLT 1.0 & the oXygen editor. If you don't have access to OXygen then use a tool like Kernow.

(On the second semester sister module ATWD2 we will consider XSLT 2.0/3.0, XPATH 3.0 & XQuery in much more detail. For instance, using XSLT to generate PHP code.)

Re-exmine the source file: transport_v1.xml

It's always good to examine the source file/s (once again) before embarking on a XSL stylesheet.

We won't make any structural changes to our input file but notice the following:

  • the by 'car' percent value is derived. A sum of 'as driver' + 'as passenger';
  • hence the last seven percent values for any ward should add up to 100.

(Points worth recalling when we come to develop a Schema.)

The eXtensible Stylesheet Language

XSL is a family of (W3C) recommendations for defining XML document transformation and presentation (formatting).

XSL is really a pattern oriented transformation language. XPATH is used to find the patterns in one or more input documents. When the (user defined) patterns are matched, XSL applies the transformations associated with those patterns and writes out the result.

This makes XSLT (the implementation of XSL) like SQL & Prolog, a declartive programming language. Expressing the logic of the computation rather than the control flow. This is in contrast to imperative programming languages (like PHP) where one or more algorithms are implemented in terms of explicit steps.

To write XSL productively you need some XML aware tools. oXygen and Kernow are great because you can load a source document and try out various XPath expressions interactively. The other reason is that Namespaces play a crucial role when developing with the XML family of technologies. Tools that are namespace aware can help greatly.

Developing the Stylesheet

In the script that follows we've used XSLT 1.0 & XPATH 1.0.

Note how the script takes advantage of the well-ordered data in the input file (i.e. the mode elements are in the same order within each ward element). This order (position) is used to generate keys.

That is, the position of the (parent) ward element in relation to its siblings is worked out using the xpath expression: count(../preceding-sibling::*) + 1 (line 25).

(Question: why is this order not significant for the PHP script (xmltodb.php) and how might we go about not relying on the order in the XSLT code.)

View the output file: transport_inserts.sql

Deconstruct the above script for questions, clarifications and criticisms. You can comment in the workshop sessions or ask a question below.


home url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignmentcodeexample3.html

Thursday 12 November 2015

ATWD1 Assignment: Code Example (2)

Mapping the data to MySQL

It would have been nice to move on to XSLT, XPATH and XQUERY but it was noticed that many of you are still trying (and struggling) to crunch the original CSV file to feed MySQL.

Let's take a step back to the suggested ER model and consider why this may not be a good idea.

atwd_trasport ER

Can you notice some of the design decisions and subtleties in this model?

  • field names are short and unique across the model
  • foreign key names are reflections of primary key names
  • the year is included as part of a composite primary key in the ward_tmode entity - why?

Once the table (SQL) schemas have been mapped to MySQL (use workbench), we will need a program to generate & execute the INSERT statements.

But there's a problem getting at the data given the structure of the original CSV file.

Pulling the (unique) ward details and (unique) transport types from the CSV file isn't too difficult but generating the inserts for the join table "ward_tmode" is much trickier. We will have to generate keys.

Or is it?

We could insist that our first instinct was right and stick with the original CSV file. Processing this isn't impossible after all. We could use fancy PHP or MySQL counters and loops and regular expressions and (possibly) end up writing lots of spaghetti code.

However the effort wouldn't be effective or energy efficient. Remember that according to Larry Wall, one of the three great virtues of a programmer is Laziness. "The quality that makes you go to great effort to reduce overall energy expenditure." Or as we say here in England, "leaving time for the Pub".

We already have a DOM structured document transport_v1.xml holding the data and this (structured) format makes our job of feeding MySQL much easier.

Consider the following script using XPath 1.0 & PDO with prepared statements:

Deconstruct the script for questions, clarifications and criticisms. We will allocate 10 minutes at the begining of Monday's lecture for this purpose.


home url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignmentcodeexample2.html

Tuesday 10 November 2015

ATWD1 Assignment: Code Example (1)

The original input file is a simple CSV file with 7 fields and 316 records. The first record is a header row.

We can process this CSV file and generate a simple XML file very easily.

Run

The output file isn't optimal for processing and also contains some repeating (unecessary) data. The location node repeats data already in the lat./long. nodes and the year value is repeated too.

However, we've taken a big leap. The data is now in a DOM compatible format!

For readabilty & easier processing purposes we would prefer a structure like the following:

For this transformation we can use PHP's built-in DOM facade library simpleXML or use XSLT (built for such xml-to-xml transformations).

Unfortunatly current PHP builds don't provide native support for XPath 2.0/3.0 so we have to do with XPath 1.0.

XPath 2.0/3.0 provide very powerful built in functions like distinct-values() so an expression like distinct-values(//record/Ward) would generate an array with unique wards.

(We can get around this by installing and using an external library like Saxon but lets use what we have - the current UWE/CEMS setup and XPath 1.0.)

Let's first consider a PHP only solution using the simpleXML and XMLWriter modules & XPath 1.0.

You can view the generated file transport_v1.xml

We'll next use XSLT 1.0 to do the same task and take a look at XSLT 2.0/3.0.


url: http://www.cems.uwe.ac.uk/~p-chatterjee/2015-16/modules/atwd1/assignment/examples/assignment_code_example1.html

Friday 26 September 2014

Upcoming Lecture & Workshop - 30/10

Lecture 2

About the History (briefly) of the WWW and some of the standards & protocols. Hopefully, some things you'll already know although you are advised to follow the resources on the module page.

Workshop 2

You should all now have a XML file with the data provided in Wks 1.

You hand encoded this file and hopefully also managed to write a PHP (or Perl or Python remember) script to get the job done.

In the coming workshop, we intend to consider a number of ways of processing this file (to generate HTML5 for example) - particularly using XSLT.

If you think your XML generation script is good, you should post it in the comments section of this blog

See you all on Tuesday.

Monday 25 February 2013

Example of JSON server / javascript client

Here is the code from the lecture last week that shows how to consume a  JSON web service using JavaScript (JQuery)

First the client, then the server PHP below



Friday 15 February 2013

.htacess file and xml reader example

.htaccess example:

xml reader code and xml file

Wednesday 23 January 2013

Assignment FAQ & Corrections

 Corrections


A couple of things pointed out by students on the assignment spec:

  1. Part C) on book suggestions is missing a closing </books> tag in the example data
  2. Part D) on borrowing data asks for the course id as a post parameter. This is not actually needed so can be omitted. (That said, if we wanted to track which students from which courses were using the books, it might be useful in some scenarios).

FAQ

  1. what is the total percent (approximate) set aside for the optional sections of the assignment?
    It is difficult to give an exact answer on this, as marks will depend to some extent on the quality of the main components. That said, assignments implementing only the main components are unlikely to score 70% or above. So - very approximately, each optional section is worth up to 7.5%.
  2. What is the .htaccess file for?
    This file can contain Apache web server mod_rewrite directives to allow you to create clean (RESTful) URLs for your web service. For example if a user enters a URL like www.pets.com/cat/50 this can be converted to www.pets.com/index.php?type=cat&id=50. Frameworks such as Codeigniter will tend to have a preconfigured .htaccess file which may well do what you need already, though it's worth having a look at it to see what it is doing. Also see this post for a way to handle it with PHP
  3.   What is the physical submission for this assignment? 23rd Feb Correction: You will be required to hand in a single page sheet at the coursework hub, containing a link to your documentation file. Please see Blackboard Announcements for further details.
  4. Is there any reason why all of the book titles have a forward slash at the end of them? Should they be retained or can I get rid of them? These are not needed and are probably an artifact from the way the data was exported originally - you can get rid of them
  5. Are there any 'set' links we have to use for the client side, or do they just need to be entered into the report. No set links for the client page, but please make sure we have the link to your client URL in the documentation you provide at:
    ../atwd/books/doc/index.html
  6.  Does the client side need to have a XML and JSON version or just one?  You only need to use one of the data formats from your service on the client side - the choice is up to you.
  7. There are other courses in the books data apart from the "CC" ones. This is because the courses list only contains computing-related courses, but some books are common to other departments. Don't worry as we will test using only the course codes in the courses.xml
  8. What do we mean by " allows full browsing of the dataset "? By this we mean that you can view all the courses, then drill in to see all the books for that course, with the books list linked to a book detail page that also show the recommendations. You don't need a single page that shows all the books in one view.

Any other queries? Let us know via the comments.