Atnaujinkite slapukų nuostatas

El. knyga: Engineering Agile Big-Data Systems

Kitos knygos pagal šią temą:
Kitos knygos pagal šią temą:

DRM apribojimai

  • Kopijuoti:

    neleidžiama

  • Spausdinti:

    neleidžiama

  • El. knygos naudojimas:

    Skaitmeninių teisių valdymas (DRM)
    Leidykla pateikė šią knygą šifruota forma, o tai reiškia, kad norint ją atrakinti ir perskaityti reikia įdiegti nemokamą programinę įrangą. Norint skaityti šią el. knygą, turite susikurti Adobe ID . Daugiau informacijos  čia. El. knygą galima atsisiųsti į 6 įrenginius (vienas vartotojas su tuo pačiu Adobe ID).

    Reikalinga programinė įranga
    Norint skaityti šią el. knygą mobiliajame įrenginyje (telefone ar planšetiniame kompiuteryje), turite įdiegti šią nemokamą programėlę: PocketBook Reader (iOS / Android)

    Norint skaityti šią el. knygą asmeniniame arba „Mac“ kompiuteryje, Jums reikalinga  Adobe Digital Editions “ (tai nemokama programa, specialiai sukurta el. knygoms. Tai nėra tas pats, kas „Adobe Reader“, kurią tikriausiai jau turite savo kompiuteryje.)

    Negalite skaityti šios el. knygos naudodami „Amazon Kindle“.

To be effective, data-intensive systems require extensive ongoing customization to reflect changing user requirements, organizational policies, and the structure and interpretation of the data they hold. Manual customization is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.

Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems.
Preface xv
Acknowledgements xvii
List of Contributors xix
List of Figures xxi
List of Tables xxix
List of Abbreviations xxxi
1 Introduction 1(20)
1.1 State of the Art in Engineering Data-Intensive Systems
2(3)
1.1.1 The Challenge
4(1)
1.2 State of the Art in Semantics-Driven Software Engineering
5(3)
1.2.1 The Challenge
8(1)
1.3 State of the Art in Data Quality Engineering
8(4)
1.3.1 The Challenge
11(1)
1.4 About ALIGNED
12(3)
1.5 ALIGNED Partners
15(2)
1.5.1 Trinity College Dublin
15(1)
1.5.2 Oxford University - Department of Computer Science
15(1)
1.5.3 Oxford University - School of Anthropology and Museum Ethnography
15(1)
1.5.4 University of Leipzig - Agile Knowledge Engineering and Semantic Web (AKSW)
15(1)
1.5.5 Semantic Web Company
16(1)
1.5.6 Wolters Kluwer Germany
16(1)
1.5.7 Adam Mickiewicz University in Poznan
16(1)
1.5.8 Wolters Kluwer Poland
17(1)
1.6 Structure
17(4)
2 ALIGNED Use Cases - Data and Software Engineering Challenges 21(20)
Arkadiusz Marciniak
Patrycja Filipowicz
2.1 Introduction
21(3)
2.2 The ALIGNED Use Cases
24(9)
2.2.1 Seshat: Global History Databank
24(2)
2.2.2 PoolParty Enterprise Application Demonstrator System
26(1)
2.2.3 DBpedia
27(2)
2.2.4 Jurion and Jurion IPG
29(2)
2.2.5 Health Data Management
31(2)
2.3 The ALIGNED Use Cases and Data Life Cycle. Major Challenges and Offered Solutions
33(3)
2.4 The ALIGNED Use Cases and Software Life Cycle. Major Challenges and Offered Solutions
36(3)
2.5 Conclusions
39(2)
3 Methodology 41(38)
James Welch
Jim Davies
Kevin Feeney
Pieter Francois
Jeremy Gibbons
Seyyed Shah
3.1 Introduction
41(2)
3.2 Software and Data Engineering Life Cycles
43(6)
3.2.1 Software Engineering Life Cycle
43(4)
3.2.2 Data Engineering Life Cycle
47(2)
3.3 Software Development Processes
49(4)
3.3.1 Model-Driven Approaches
49(2)
3.3.2 Formal Techniques
51(1)
3.3.3 Test-Driven Development
52(1)
3.4 Integration Points and Harmonisation
53(7)
3.4.1 Integration Points
54(1)
3.4.2 Barriers to Harmonisation
55(3)
3.4.3 Methodology Requirements
58(2)
3.5 An ALIGNED Methodology
60(5)
3.5.1 A General Framework for Process Management
60(3)
3.5.2 An Iterative Methodology and Illustration
63(2)
3.6 Recommendations
65(4)
3.6.1 Sample Methodology
66(3)
3.7 Sample Synchronisation Point Activities
69(5)
3.7.1 Model Catalogue: Analysis and Search/Browse/Explore
70(1)
3.7.2 Model Catalogue: Design and Classify/Enrich
71(1)
3.7.3 Semantic Booster: Implementation and Store/Query
72(1)
3.7.4 Semantic Booster: Maintenance and Search/Browse/Explore
72(2)
3.8 Summary
74(2)
3.8.1 Related Work
74(2)
3.9 Conclusions
76(3)
4 ALIGNED MetaModel Overview 79(46)
Rob Brennan
Bojan Bozic
Odhran Gavin
Monika Solanki
4.1 Generic Metamodel
80(3)
4.1.1 Basic Approach
80(1)
4.1.2 Namespaces and URIs
81(1)
4.1.3 Expressivity of Vocabularies
82(1)
4.1.4 Reference Style for External Terms
82(1)
4.1.5 Links with W3C PROV
82(1)
4.2 ALIGNED Generic Metamodel
83(1)
4.2.1 Design Intent Ontology (DIO)
83(1)
4.3 Software Engineering
83(3)
4.3.1 Software Life Cycle Ontology
83(2)
4.3.2 Software Implementation Process Ontology (SIP)
85(1)
4.4 Data Engineering
86(1)
4.4.1 Data Life Cycle Ontology
86(1)
4.5 DBpedia DataID (DataID)
87(2)
4.6 Unified Quality Reports
89(36)
4.6.1 Reasoning Violation Ontology (RVO) Overview
89(2)
4.6.2 W3C SHACL Reporting Vocabulary
91(2)
4.6.3 Data Quality Vocabulary
93(3)
4.6.4 Test-Driven RDF Validation Ontology (RUT)
96(13)
4.6.5 Enterprise Software Development (DIOPP)
109(2)
4.6.6 Unified Governance Domain Ontologies
111(1)
4.6.7 Semantic Booster and Model Catalogue Domain Ontology
112(1)
4.6.7.1 Model catalogue
112(1)
4.6.7.2 Booster
113(1)
4.6.8 PROV
113(2)
4.6.9 SKOS
115(2)
4.6.10 OWL
117(2)
4.6.11 RDFS
119(2)
4.6.12 RDF
121(4)
5 Tools 125(76)
Kevin Feeney
Christian Dirschl
Katja Eck
Dimitris Kontokostas
Gavin Mendel-Gleason
Helmut Nagy
Christian Mader
Andreas Koller
5.1 Model Catalogue
125(30)
5.1.1 Introduction
125(2)
5.1.2 Model Catalogue
127(11)
5.1.2.1 Architecture
127(3)
5.1.2.2 Searching and browsing the catalogue
130(1)
5.1.2.3 Editing the catalogue contents
131(3)
5.1.2.4 Administration
134(1)
5.1.2.5 Eclipse integration and model-driven development
134(2)
5.1.2.6 Semantic reasoning
136(1)
5.1.2.7 Automation and search
137(1)
5.1.3 Semantic Booster
138(17)
5.1.3.1 Introduction
138(1)
5.1.3.2 Semantic Booster
139(16)
5.2 RDFUnit
155(9)
5.2.1 RDFUnit Integration
157(7)
5.2.1.1 JUnit XML report-based integration
158(1)
5.2.1.2 Custom apache maven-based integration
158(2)
5.2.1.3 The shapes constraint language (SHACL)
160(1)
5.2.1.4 Comparison of SHACL to schema definition using RDFUnit test patterns
161(1)
5.2.1.5 Comparison of SHACL to auto-generated RDFUnit tests from RDFS/OWL axioms
162(1)
5.2.1.6 Progress on the SHACL specification and standardisation process
163(1)
5.2.1.7 SHACL support in RDFUnit
163(1)
5.3 Expert Curation Tools and Workflows
164(8)
5.3.1 Requirements
165(2)
5.3.1.1 Graduated application of semantics
165(1)
5.3.1.2 Graph - object mapping
165(1)
5.3.1.3 Object/document level state management and versioning
166(1)
5.3.1.4 Object-based workflow interfaces
166(1)
5.3.1.5 Integrated, automated, constraint validation
166(1)
5.3.1.6 Result interpretation
167(1)
5.3.1.7 Deferred updates
167(1)
5.3.2 Workflow/Process Models
167(5)
5.3.2.1 Process model 1 linked data object creation
167(1)
5.3.2.2 Process model 2 object - linked data object updates
168(1)
5.3.2.3 Process model 3 updates to deferred updates
168(1)
5.3.2.4 Process model 4 schema updates
169(1)
5.3.2.5 Process model 5 validating schema updates
170(1)
5.3.2.6 Process model 6 named graph creation
170(1)
5.3.2.7 Process model 7 instance data updates and named graphs
171(1)
5.4 Dacura Approval Queue Manager
172(1)
5.5 Dacura Linked Data Object Viewer
172(4)
5.5.1 CSP Design of Seshat Workflow Use Case
173(1)
5.5.2 Specification
174(2)
5.6 Dacura Quality Service
176(8)
5.6.1 Technical Overview of Dacura Quality Service
177(1)
5.6.2 Dacura Quality Service API
178(6)
5.6.2.1 Resource and interchange format
178(1)
5.6.2.2 URI
178(1)
5.6.2.3 Literals
178(1)
5.6.2.4 Literal types
178(1)
5.6.2.5 Quads
179(1)
5.6.2.6 POST variables
180(1)
5.6.2.7 Tests
180(1)
5.6.2.8 Required schema tests
180(1)
5.6.2.9 Schema tests
181(1)
5.6.2.10 Errors
182(1)
5.6.2.11 Endpoints
182(2)
5.7 Linked Data Model Mapping
184(11)
5.7.1 Interlink Validation Tool
184(6)
5.7.1.1 Interlink validation
185(2)
5.7.1.2 Technical overview
187(1)
5.7.1.3 Configuration via iv_config.txt
188(1)
5.7.1.4 Configuration via external_datasets.txt
189(1)
5.7.1.5 Execute the interlink validator tool
190(1)
5.7.2 Dacura Linked Model Mapper
190(3)
5.7.3 Model Mapper Service
193(2)
5.7.3.1 Modelling tool - creating mappings
193(1)
5.7.3.2 Importing semi-structured data with data harvesting tool
193(2)
5.8 Model-Driven Data Curation
195(6)
5.8.1 Dacura Quality Service Frame Generation
196(1)
5.8.2 Frames for Userinterface Design
197(1)
5.8.3 SemiFormal Frame Specification
197(2)
5.8.4 Frame API Endpoints
199(2)
6 Use Cases 201(104)
Kevin Feeney
Christian Dirschl
Andreas Koller
James Welch
Dimitris Kontokostas
Pieter Francois
Sabina Lobocka
Piotr Bledzki
6.1 Wolters Kluwer - Re-Engineering a Complex Relational Database Application
201(34)
6.1.1 Introduction
201(1)
6.1.2 Problem Statement
202(2)
6.1.3 Actors
204(2)
6.1.4 Implementation
206(9)
6.1.4.1 PoolParty notification extension
206(1)
6.1.4.2 rsine notification extension
206(1)
6.1.4.2.1 Results
206(1)
6.1.4.3 RDFUnit for data transformation
207(4)
6.1.4.4 PoolParty external link validity
211(3)
6.1.4.5 Statistical overview
214(1)
6.1.5 Evaluation
215(4)
6.1.5.1 Productivity
217(1)
6.1.5.2 Quality
217(1)
6.1.5.3 Agility
217(1)
6.1.5.4 Measuring overall value
218(1)
6.1.5.5 Data quality dimensions and thresholds
218(1)
6.1.5.6 Model agility
219(1)
6.1.5.7 Data agility
219(1)
6.1.6 JURION IPG
219(16)
6.1.6.1 Introduction
219(6)
6.1.6.2 Architecture
225(2)
6.1.6.3 Tools and features
227(1)
6.1.6.4 Implementation
228(4)
6.1.6.5 Evaluation
232(2)
6.1.6.6 Experimental evaluation
234(1)
6.2 Seshat - Collecting and Curating High-Value Datasets with the Dacura Platform
235(24)
6.2.1 Use Case
237(1)
6.2.1.1 Problem statement
237(1)
6.2.2 Architecture
238(2)
6.2.2.1 Tools and features
240(1)
6.2.3 Implementation
240(6)
6.2.3.1 Dacura data curation platform
240(1)
6.2.3.2 General description
240(1)
6.2.3.3 Detailed process
241(5)
6.2.4 Overview of the Model Catalogue
246(7)
6.2.4.1 Model catalogue in the demonstrator system
250(3)
6.2.5 Seshat Trial Platform Evaluation
253(6)
6.2.5.1 Measuring overall value
253(1)
6.2.5.2 Data quality dimensions and thresholds
253(6)
6.3 Managing Data for the NHS
259(13)
6.3.1 Introduction
259(1)
6.3.2 Use Case
260(1)
6.3.2.1 Quality
260(1)
6.3.2.2 Agility
260(1)
6.3.3 Architecture
261(2)
6.3.4 Implementation
263(5)
6.3.4.1 Model catalogue
263(1)
6.3.4.2 NIHR health informatics collaborative
263(5)
6.3.5 Evaluation
268(4)
6.3.5.1 Productivity
269(2)
6.3.5.2 Quality
271(1)
6.3.5.3 Agility
272(1)
6.4 Integrating Semantic Datasets into Enterprise Information Systems with Poolparty
272(30)
6.4.1 Introduction
272(2)
6.4.2 Problem Statement
274(1)
6.4.2.1 Actors
274(1)
6.4.3 Architecture
274(2)
6.4.4 Implementation
276(8)
6.4.4.1 Consistency violation detector
276(1)
6.4.4.2 RDFUnit test generator
277(1)
6.4.4.3 PoolParty integration
277(1)
6.4.4.4 Notification adaptations
277(1)
6.4.4.5 RDFUnit
278(1)
6.4.4.6 Validation on import
278(6)
6.4.5 Results
284(11)
6.4.5.1 RDF constraints check
285(1)
6.4.5.2 RDF validation
286(3)
6.4.5.3 Improved notifications
289(4)
6.4.5.4 Unified governance
293(2)
6.4.6 Evaluation
295(7)
6.4.6.1 Measuring overall value
295(4)
6.4.6.2 Data quality dimensions and thresholds
299(1)
6.4.6.3 Evaluation tasks
300(2)
6.5 Data Validation at DBpedia
302(11)
6.5.1 Introduction
302(1)
6.5.2 Problem Statement
302(1)
6.5.2.1 Actors
303(1)
6.5.3 Architecture
303(1)
6.5.4 Tools and Features
304(1)
6.5.5 Implementation
305(4)
6.5.6 Evaluation
309(4)
6.5.6.1 Productivity
309(1)
6.5.6.2 Quality
310(2)
6.5.6.3 Agility
312
7 Evaluation 305(20)
Pieter Francois
Stephanie Grohmann
Katja Eck
Odhran Gavin
Andreas Koller
Helmut Nagy
Christian Dirschl
Peter Turchin
Harvey Whitehouse
7.1 Key Metrics for Evaluation
313(5)
7.1.1 Productivity
315(1)
7.1.2 Quality
316(1)
7.1.3 Agility
316(1)
7.1.4 Usability
317(1)
7.2 ALIGNED Ethics Processes
318(2)
7.3 Common Evaluation Framework
320(3)
7.3.1 Productivity
320(1)
7.3.2 Quality
320(1)
7.3.3 Agility
321(2)
7.4 ALIGNED Evaluation Ontology
323(2)
Appendix A Requirements 325(70)
Index 395(4)
About the Editors 399
Kevin Feeney, Jim Davies, James Welch