DB Lecture 1
Course
Orga
- moodle for communication, incl. cancellations
- Präsenzlehre
- Folien online?
- Questions anytime during/after lectures
- Sprechstunde 13:40 - 14:00 (?)
- Answers to Klausur can be in English weil die Folien sind in English
Prerequisites
- modern relationa DB
- SQL?
- /D Mengenlehre
Praktikum/Übung
- Solve tasks with NoQL db - data import, queries
- Computer pool with PostgresSQL and VMs
- Possible to test from home through VPN?
Score
- Points from Übungen?
- she suggests to visit all of them
- Written (?) may change
Other
- Library - in slide
- Masterthesis possible
- Current topics in data mgmt for DS (life sci. ontologies, semantic annotation)
- PP - temporal data
- DS/scientific workflows
- “on my website I have topics of theses ich habe betreut”
- TODO website
Content
- DBs, esp. NoSQL
- NoSQL
- categories, properties,
- NoSQL vs relational DB
- partitioning, consistency, repliction
- storage, retrieval
- NoSQL
- key-value and document stores
- wide column / record stores
- search in large data sets
Relevance & context
- relevance
- relevant
- pipeline:
- steps
- /D daten-acquise - getting / generating
- Cleaning
- Integration
- Analyse - classical ds/ml
- Evaluierung
- Interpretierung
- cursive - 80%
- steps
- choosing
- tradeoffs: sometimes you need to have it fast more than consistency etc.
Chapter 1. - Recap
Basics
- D/DB System
- system to persistently store and manage large data sometimes
- realization of data-intensive apps
- Why?
- avoid data loss
- many users
- long-term storage
Data Model
- D/Data Model is a model that descriiract way how data is represented in an info or DB system
- a system of concepts and their interrelatiosn
- the “language” used to describe data
- syntax and semantic
- fundamental to other bits like integrity etc.
- Example:
- Java: objects of classes that have attributes and references to other obj + methods to access
- relational db - structures tables of tuples with attrs, foreign keys, constraints etc.
- tuple here is a row/Zeile in a DB?
- TODO L/ Zeile, Spalte
- there are also hierarchical etc.
Information system
- D/Database management system (DBMS) - software system to define, manage ,process and analyze DB data
- (DB is the data itself)
-
DB <-> [database management system (DBMS)]
- Example: File system for data management
- why not files/folders?
- two files, one with student name, the other with marks etc.
- redundancy/inconsistency
- updating / renaming names/Fächer is hard to do in multiple files
- increased storage reqs
- no central data storage
- every application manages their own data independence
- dep on file structure
- programmers have to know about the storage and internal data repr.
- multiple users working on the same data
- no waranties wrt data protection / privacy
- access rights
- /IL “gentrennte Zugriff geben wollen würde”
- why not files/folders?
Transaction
- Interface for transactions 1. begin of transaction (BOT) 2. commit transaction (COMMIT WORK in SQL) 3. rollback trans. (ROLLBACK WORK in SQL)
- D/DML (Data manipulation language)
- has commands like
INSERT, UPDATE, DELETE
, etc. in SQL
- has commands like
- possible terminations of a transaction
BOT
,DML[1], DML[2], ... DML[k]
,- can be multiple, but capsuled as single transaction!
- Two options:
- Normal:
COMMIT WORK
- Abnormal:
ROLLBACK WORK
(zB Integritätsbedingung verletzt)- Enforced rollback (Stromausfall)
- Normal:
ACID
- RDBMS ensure ACID properties for transactions
- D/ACID
- Atomicity
- “all or nothing” property
- if one part fails, the entire transaction fails and the DB state is left unchanged
- Consistency
- A succesful transaction preserves DB consistency
- /? Definition of integrity constraints
- Isolation
- Concurrent execution of transactions results in a system state as if they were executed serially
- T. can’t rely on interm. or unfinished state.
- Won’t it be slow?
- Sometimes you can sacrifice part of this, esp. for Verteilte systeme
- Atomicity
3-level schema architecture
-
Slide ??
-
TODO picture
-
Describes abstraction steps:
- /? TODO
- D/Logical data independence: changes to the logical schema must not require a change to an application (external schema) based on the structure
- If I renamed a Spalte, programs shouldn’t require changes
- D/Physical data independence: -/- (how data is stored) shouldn’t require changes to the logical schema
- Programs that use it shouldn’t have to get changes
-
Example DB
- External schema:
- website or anw. that use/view parts of some tables
- Conceptual:
- Table “Booking” has columns like
KdNr
(int), you can then look up the Kunde in the table “Kunden” etc.
- Table “Booking” has columns like
- Internal
Customer[KdNr: int]
- External schema:
Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus