serhii.net

In the middle of the desert you can say anything you want

06 Oct 2022

DB Lecture 1

Course

Orga

  • moodle for communication, incl. cancellations
  • Präsenzlehre
  • Folien online?
  • Questions anytime during/after lectures
  • Sprechstunde 13:40 - 14:00 (?)
  • email
  • Answers to Klausur can be in English weil die Folien sind in English

Prerequisites

  • modern relationa DB
  • SQL?
  • /D Mengenlehre

Praktikum/Übung

  • Solve tasks with NoQL db - data import, queries
  • Computer pool with PostgresSQL and VMs
  • Possible to test from home through VPN?

Score

  • Points from Übungen?
    • she suggests to visit all of them
  • Written (?) may change

Other

  • Library - in slide
  • Masterthesis possible
    • Current topics in data mgmt for DS (life sci. ontologies, semantic annotation)
    • PP - temporal data
    • DS/scientific workflows
    • “on my website I have topics of theses ich habe betreut”
      • TODO website

Content

  • DBs, esp. NoSQL
    • NoSQL
      • categories, properties,
      • NoSQL vs relational DB
    • partitioning, consistency, repliction
    • storage, retrieval
  • key-value and document stores
  • wide column / record stores
  • search in large data sets

Relevance & context

  • relevance
    • relevant
  • pipeline:
    • steps
      • /D daten-acquise - getting / generating
      • Cleaning
      • Integration
      • Analyse - classical ds/ml
      • Evaluierung
      • Interpretierung
    • cursive - 80%
  • choosing
    • tradeoffs: sometimes you need to have it fast more than consistency etc.

Chapter 1. - Recap

Basics

  • D/DB System
    • system to persistently store and manage large data sometimes
    • realization of data-intensive apps
  • Why?
    • avoid data loss
    • many users
    • long-term storage

Data Model

  • D/Data Model is a model that descriiract way how data is represented in an info or DB system
    • a system of concepts and their interrelatiosn
    • the “language” used to describe data
    • syntax and semantic
    • fundamental to other bits like integrity etc.
    • Example:
      • Java: objects of classes that have attributes and references to other obj + methods to access
      • relational db - structures tables of tuples with attrs, foreign keys, constraints etc.
        • tuple here is a row/Zeile in a DB?
        • TODO L/ Zeile, Spalte
      • there are also hierarchical etc.

Information system

  • D/Database management system (DBMS) - software system to define, manage ,process and analyze DB data
    • (DB is the data itself)
  • DB <-> [database management system (DBMS)]
    
  • Example: File system for data management
    • why not files/folders?
      • two files, one with student name, the other with marks etc.
      • redundancy/inconsistency
        • updating / renaming names/Fächer is hard to do in multiple files
      • increased storage reqs
      • no central data storage
        • every application manages their own data independence
        • dep on file structure
        • programmers have to know about the storage and internal data repr.
      • multiple users working on the same data
      • no waranties wrt data protection / privacy
        • access rights
      • /IL “gentrennte Zugriff geben wollen würde”

Transaction

  • Interface for transactions 1. begin of transaction (BOT) 2. commit transaction (COMMIT WORK in SQL) 3. rollback trans. (ROLLBACK WORK in SQL)
  • D/DML (Data manipulation language)
    • has commands like INSERT, UPDATE, DELETE, etc. in SQL
  • possible terminations of a transaction
    • BOT,
    • DML[1], DML[2], ... DML[k],
      • can be multiple, but capsuled as single transaction!
    • Two options:
      • Normal: COMMIT WORK
      • Abnormal:
        • ROLLBACK WORK (zB Integritätsbedingung verletzt)
        • Enforced rollback (Stromausfall)

ACID

  • RDBMS ensure ACID properties for transactions
  • D/ACID
    • Atomicity
      • “all or nothing” property
      • if one part fails, the entire transaction fails and the DB state is left unchanged
    • Consistency
      • A succesful transaction preserves DB consistency
      • /? Definition of integrity constraints
    • Isolation
      • Concurrent execution of transactions results in a system state as if they were executed serially
      • T. can’t rely on interm. or unfinished state.
      • Won’t it be slow?
        • Sometimes you can sacrifice part of this, esp. for Verteilte systeme

3-level schema architecture

  • Slide ??

  • TODO picture

  • Describes abstraction steps:

    • /? TODO
    • D/Logical data independence: changes to the logical schema must not require a change to an application (external schema) based on the structure
      • If I renamed a Spalte, programs shouldn’t require changes
    • D/Physical data independence: -/- (how data is stored) shouldn’t require changes to the logical schema
      • Programs that use it shouldn’t have to get changes
  • Example DB

    • External schema:
      • website or anw. that use/view parts of some tables
    • Conceptual:
      • Table “Booking” has columns like KdNr(int), you can then look up the Kunde in the table “Kunden” etc.
    • Internal
      • Customer[KdNr: int]
Nel mezzo del deserto posso dire tutto quello che voglio.