You are here
Sessions
Keynote
-
09:45 - 10:45
Over the past decade; XML's usage, adoption and application has been both broad and deep. At times, the application of XML veered on the ridiculous and there has been many experimental 'bumps' on the way to enlightenment. As with any technology riding a decade long hype cycle, we have made friends and enemies along the way. This presentation is about defining Modern XML in today's context of BigData, fixing XML 'place' in the dataverse as well as acknowledging where and when XML should be applied (or not applied).We will (have some fun) review some of the XML stereotypes of the past, as well as highlight those XML technologies that have been successful and review some of the larger failures.I will present a range of use cases where using XML, either as the primary format or in some complimentary approach, is the right strategic approach. I will also identify scenarios where XML is not correct. By doing this I will set out Modern XML's appropriate problem domain in context of BigData.Fundamentally, I will Illustrate how XML can play to its strengths, alongside other data formats (playing to their strengths) when living in a BigData world. -
16:30 - 17:30
Jem describes the latest developments in the transformational technology strategy the BBC Future Media & Technology department is using to evolve from a relational content model and static publishing framework to a fully dynamic semantic publishing (DSP) architecture. This approach uses linked data technology to automate the aggregation, publishing and re-purposing of interrelated content objects according to an ontological domain-modelled information architecture, providing a greatly improved user experience and high levels of user engagement. The BBC's World Cup web site was the first showcase of DSP, and probably the first major implementation of semantic web technologies on a commercial media site. In 2012, the BBC launched three new sites based on DSP: The 2012 Olympics, a completely redesigned BBC Sports site and the new news Mobile site.
Big Data
-
11:00 - 12:00
Since a couple of years, the NoSQL movement has developed a variety of open-source document stores. Most of them focus on high availability, horizontal scalability, and are designed to run on commodity hardware. These products have gained great traction in the industry to store large amounts of flexible data (mostly JSON). In the meantime, XQuery has evolved to a standardized, full-fledged programming language for XML with native support for complex queries, indexes, updates, full-text search, and scripting. Moreover, JSON has recently been added as a first-level datatype into the language. As of today, it is without doubt the most robust and productive technology to process flexible data.The aim of this talk is to showcase the benefits that can be achieved by integrating the Zorba XQuery Processor with MongoDB. We will introduce the 28msec platform that seamlessly stores, indexes, and manages flexible data entirely in XQuery. The data itself is stored in MongoDB. The platform leverages MongoDB’s indexes, sharding, and consistency guarantees to scale-out horizontally. The talk will conclude by showing a benchmark of the platform and discuss perspectives of the outlined approach. -
"Generating source code with MarkLogic and Hadoop using the Genetic Algorithm"
Jim Fuller - MarkLogic (Gold Sponsor)12:00 - 13:00
In this presentation I will demonstrate an experimental approach, using MarkLogic and Hadoop to implement a Genetic Algorithm engine which will generate 'fit for purpose' computer programs that are custom built specifically for particular datasets. These programs will generate a data summary, providing scoring and provenance information along the way.Currently a lot of work that is being done in the ‘bigdata realm’ is focused on making tooling easier as well as enabling first order data analysis on data sets that were too large to manage with classic approaches.At MarkLogic we’ve been doing ‘bigdata’ for a decade, well before the term ‘bigdata’ was coined and many of us are applying these toolsets to an ever expanding domain of problems. Bigdata is a great meme that captures the multiple V’s (volume/variability/velocity). It also embeds the notion of dealing with complexity.This presentation is on how we can apply ‘Big Data’ tools to help us manage and comprehend complexity.One area of which I have maintained interest over the years, is how to apply the realtime query power of MarkLogic, combining it with its Hadoop integration. One particular problem that is amenable, it that of analyzing large data sets and automatically generating higher order programmatic code which could then be used to transform highly variable data into a more concise, consistent and fact driven summary.Data summaries are useful for quickly understanding ‘what you got’ and can help direct future data mining efforts.In this presentation we will demonstrate an experimental approach, using MarkLogic and Hadoop to implement a Genetic Algorithm engine which will generate ‘fit for purpose’ computer programs that are custom built specifically for particular datasets to generate a ‘comprehension’ summary.As data gets larger and more variable, we believe developers will need such tools to manage complexity and integration; with bigdata this problem is further compounded and can become very expensive.Lastly, we hear time and time again how our clients use MarkLogic to generate actionable ‘intelligence’, but with the caveat that decision makers do not want a ‘yes/no’ machine which generates answers to questions without explaining how they arrived at the conclusion. We believe that data comprehension tools could come to represent a whole new class of analysis tool and will spend some time describing how our approach could potentially evolve to address end user needs.
Velocity
-
11:00 - 12:00
The 'Parsing Time' suite is a cloud based XML parsing and XML management tool complete with fully graphical parser, web services tool and access control. At the heart of our parsing suite, is an extremely fast parser with the capability to parse up to 7GB xml files. The graphical parser allows complex, extensive XPATH type parsing with multi-generational search, regular expressions and mathematical operators without writing code. Parsing results can be seen straight away and so the tool can be used to set up parsing rules quickly and prevent errors. Our tool also provides the ability to modify XML files based on search results.Remote and uploaded XML files can be managed by the parsing suite and files can be shared within an organization fine-grained using access control with groups and roles. Encryption and SSL provide security. Our tool also contains a web service component which allows users to consume web services and choose methods graphically, without writing code, even for web services with complex, schema-based components. Our tool also enables programmatic access to parsing with dynamic parameters using our generated client code. -
12:00 - 13:00
Extending the Internet to embedded devices has created a new network known as the “Internet of Things”. Emerging wireless technologies are being created and standardised which specialise in reducing power consumption at the expense of network bandwidth. Although many established applications such as home automation work well with limited bandwidth, there are applications that need to deal with issues of scalability. For example, applications such as emergency networks, environmental monitoring networks and other large scale sensor networks.In this presentation we will discuss an XML tool which has been designed to support this application domain. The tool utilises XML Schema to provide both validation of the data and compression. We will show how the level of compression achieved outperforms other well known XML compression techniques for its target application area. We will go on to discuss the development of a light-weight XML messaging framework built on top of this technology and provide an example application.
Searching
-
14:00 - 15:00
In this talk I present an integration of Lucene search engine with EMC Documentum xDB database (native XML database). I introduce a new Lucene MultiPath Index (LMPI), which is build on top of Lucene technology in order to evaluate complex Xquery queries efficiently. I describe a general architecture of the index including a concept of SubPaths, mapping of SubPaths into Lucene fields and query optimization strategies.I also present a new approach of integrating Lucene technology into transactional database on the storage level. That is, lucene files are stored to the XDB data pages instead of the file system and Lucene accesses all the files through xDB buffer pool instead of the just the Operating system buffer cache. This approach allows us to simplify significantly the implementation of traditional database features for LMPI within xDB like transactions isolation, rollbacks, recovery after database crashes, snapshots construction , replication, hot backups, buffer management, etc. I cover performance analysis of LMPI for queries and ingest operations, performance tuning tips and future optimization techniques in the area. -
15:00 - 16:00
Semantic Technologies (and Linked Data in particular) provide various opportunities for reducing the cost and complexity of information integration, processing and management within and across the enterprises, but at the same time there are certain challenges associated with such technologies. This talk will provide details on the successful application of Semantic Technologies and Linked Data in several industrial and research Big Data scenarios.
XML
-
14:00 - 15:00
This talk is focused on the security impacts of different aspects of XML processing. It includes many demonstrations, based on nearly 2 years of hands-on research and dozens of published or pending security vulnerabilities.[=] Applications impacted by this researchBrowsers: Internet Explorer, Webkit, Firefox, Opera
Databases: Oracle, Postgres
SVG processing: Inkscape, Apache Batik
Misc: Adobe Reader, Sharepoint, PHP, xmlsec, MoinMoin, RESTlet[=] Automatic stress testing of XML and XSLT processorsIntroduction to mutation-based fuzzing
Fuzzing testbeds: CLI and Web
Examples of findings[=] XML as a container: XDPIntroduction to the XDP format
Hiding malicious PDF from anti-virus products using XDP[=] DTD: Stealing local files and accessing the networkBasic vulnerability using XML External Entities
Processor-specific features: Windows, Java, PHP[=] XSLT: Abuse of legitimate featuresIdentification of dangerous XSLT features:
- per each version of the norm
- per each processor
Advanced exploitation of these features -
15:00 - 16:00
Since the introduction of XSLT 1.0 processing in the browser, the web has moved on. The range of exploitable XML resources has grown enormously, but users now expect a richer, more interactive and responsive browsing experience.This presentation will demonstrate how, using the browser-based Saxon-CE XSLT 2.0 processor, either whole web-pages or selected parts of the page can be updated dynamically using XSLT templates bound to specific user events. It will go on to show how this client-side capability can be used to provide a cohesive visualization of large sets of XML resources in a responsive way.



