DB Lecture 1
Course
Orga
- moodle for communication, incl. cancellations
 - Präsenzlehre
 - Folien online?
 - Questions anytime during/after lectures
 - Sprechstunde 13:40 - 14:00 (?)
 - Answers to Klausur can be in English weil die Folien sind in English
 
Prerequisites
- modern relationa DB
 - SQL?
 - /D Mengenlehre
 
Praktikum/Übung
- Solve tasks with NoQL db - data import, queries
 - Computer pool with PostgresSQL and VMs
 - Possible to test from home through VPN?
 
Score
- Points from Übungen?
- she suggests to visit all of them
 
 - Written (?) may change
 
Other
- Library - in slide
 - Masterthesis possible
- Current topics in data mgmt for DS (life sci. ontologies, semantic annotation)
 - PP - temporal data
 - DS/scientific workflows
 - “on my website I have topics of theses ich habe betreut”
- TODO website
 
 
 
Content
- DBs,  esp. NoSQL
- NoSQL
- categories, properties,
 - NoSQL vs relational DB
 
 - partitioning, consistency, repliction
 - storage, retrieval
 
 - NoSQL
 - key-value and document stores
 - wide column / record stores
 - search in large data sets
 
Relevance & context
- relevance
- relevant
 
 - pipeline:
- steps
- /D daten-acquise - getting / generating
 - Cleaning
 - Integration
 - Analyse - classical ds/ml
 - Evaluierung
 - Interpretierung
 
 - cursive - 80%
 
 - steps
 - choosing
- tradeoffs: sometimes you need to have it fast more than consistency etc.
 
 
Chapter 1. - Recap
Basics
- D/DB System
- system to persistently store and manage large data sometimes
 - realization of data-intensive apps
 
 - Why?
- avoid data loss
 - many users
 - long-term storage
 
 
Data Model
- D/Data Model is a model that descriiract way how data is represented in an info or DB system
- a system of concepts and their interrelatiosn
 - the “language” used to describe data
 - syntax and semantic
 - fundamental to other bits like integrity etc.
 - Example:
- Java: objects of classes that have attributes and references to other obj + methods to access
 - relational db - structures tables of tuples with attrs,  foreign keys,  constraints etc.
- tuple here is a row/Zeile in a DB?
 - TODO L/ Zeile, Spalte
 
 - there are also hierarchical etc.
 
 
 
Information system
- D/Database management system (DBMS) - software system to define, manage ,process and analyze DB data
- (DB is the data itself)
 
 - 
DB <-> [database management system (DBMS)] - Example: File system for data management
- why not files/folders?
- two files, one with student name, the other with marks etc.
 - redundancy/inconsistency
- updating / renaming names/Fächer is hard to do in multiple files
 
 - increased storage reqs
 - no central data storage
- every application manages their own data independence
 - dep on file structure
 - programmers have to know about the storage and internal data repr.
 
 - multiple users working on the same data
 - no waranties wrt data protection / privacy
- access rights
 
 - /IL “gentrennte Zugriff geben wollen würde”
 
 
 - why not files/folders?
 
Transaction
- Interface for transactions 1. begin of transaction (BOT) 2. commit transaction (COMMIT WORK in SQL) 3. rollback trans. (ROLLBACK WORK in SQL)
 - D/DML (Data manipulation language)
- has commands like 
INSERT, UPDATE, DELETE, etc. in SQL 
 - has commands like 
 - possible terminations of a transaction
BOT,DML[1], DML[2], ... DML[k],- can be multiple, but capsuled as single transaction!
 
- Two options:
- Normal: 
COMMIT WORK - Abnormal:
ROLLBACK WORK(zB Integritätsbedingung verletzt)- Enforced rollback (Stromausfall)
 
 
 - Normal: 
 
 
ACID
- RDBMS ensure ACID properties for transactions
 - D/ACID
- Atomicity
- “all or nothing” property
 - if one part fails, the entire transaction fails and the DB state is left unchanged
 
 - Consistency
- A succesful transaction preserves DB consistency
 - /? Definition of integrity constraints
 
 - Isolation
- Concurrent execution of transactions results in a system state as if they were executed serially
 - T. can’t rely on interm. or unfinished state.
 - Won’t it be slow?
- Sometimes you can sacrifice part of this, esp. for Verteilte systeme
 
 
 
 - Atomicity
 
3-level schema architecture
- 
Slide ??
 - 
TODO picture
 - 
Describes abstraction steps:
- /? TODO
 - D/Logical data independence: changes to the logical schema must not require a change to an application (external schema) based on the structure
- If I renamed a Spalte, programs shouldn’t require changes
 
 - D/Physical data independence: -/- (how data is stored)  shouldn’t require changes to the logical schema
- Programs that use it shouldn’t have to get changes
 
 
 - 
Example DB
- External schema:
- website or anw. that use/view parts of some tables
 
 - Conceptual:
- Table “Booking” has columns like 
KdNr(int), you can then look up the Kunde in the table “Kunden” etc. 
 - Table “Booking” has columns like 
 - Internal
Customer[KdNr: int]
 
 - External schema:
 
				
					Nel mezzo del deserto posso dire tutto quello che voglio.
				
			
comments powered by Disqus