### Script generated by TTT Title: Petter: Programmiersprachen (14.10.2015) Date: Wed Oct 14 14:21:05 CEST 2015 Duration: 90:18 min Pages: 49 #### 9898 9898 8988 TECHNISCHE UNIVERSITÄT MÜNCHEN FAKULTÄT FÜR INFORMATIK # **Programming Languages** Dr. Michael Petter Winter term 2015 Dr. Michael Petter 1/4 ## **Overall Structure** This course is given by - Michael Petter: petter@in.tum.de - there are at least 12 lectures - there will be a written exam - there is no repeated exam in this winter or the following summer semester # **Overall Structure** This course is given by - Michael Petter: petter@in.tum.de - there are at least 12 lectures - there will be a written exam - there is no repeated exam in this winter or the following summer semester We present selected topics that focus on current issues in the design or implementation of programming languages: - concurrency - modularization Dr. Michael Petter 2/4 Dr. Michael Petter ### **Outline** - Low-Level Concurrency, Memory Barriers - Wait-Free Algorithms - Locks and Monitors - Transactional Memory #### 2. Modularization - Method dispatching - Multiple Inheritance - Mixins and Traits - Prototype based Languages - Aspect Orientation - Generics and Templates Dr. Michael Petter 3/4 ### **Administration** #### Lectures: - every Wednesday at 14:15pm - no lectures on bank holidays: 23.12., 30.12., 6.1. - slides on http://www2.in.tum.de/hp/Main?nid=281 - lectures are recorded <a href="http://ttt.in.tum.de">http://ttt.in.tum.de</a> Michael Petter ### **Administration** #### Lectures: - every Wednesday at 14:15pm - no lectures on bank holidays: 23.12., 30.12., 6.1. - slides on http://www2.in.tum.de/hp/Main?nid=281 - lectures are recorded http://ttt.in.tum.de #### Exercises: - an exercise sheet is made available with every lecture - solving the sheets is *voluntary* but *recommended* ## **Administration** #### Lectures: - every Wednesday at 14:15pm - no lectures on bank holidays: 23.12., 30.12., 6.1. - slides on http://www2.in.tum.de/hp/Main?nid=281 - lectures are recorded http://ttt.in.tum.de #### Exercises: - an exercise sheet is made available with every lecture - solving the sheets is voluntary but recommended #### Tutorial sessions: - for now, Fridays at 10.00am in this room (H2), starting Oct. 16 - the exercises are meant to be completed at home - solutions to these exercises will be presented - be there! - if you have questions of the general kind - if you have problems with the exercise sheet Petter 4/4 Dr. Michael Petter Programming Languages offer a broad spectre for hands on experience! - Programming projects - ► Earn a .3 bonus on the final exam - Announced in the middle of the lecture - Inverted Classroom - ► Switch over to IC for *Modularization* part - Preparation of lecture at home - ► Hands on experiences live at the "lecture" **TECHNISCHE** FAKULTÄT Dr. Michael Petter 9 | 1 | 9 | 0 | 0 | 9 | 0 | 0 | 8 | 0 | 0 | UNIVERSITÄT FÜR MÜNCHEN **INFORMATIK** **Programming Languages** Concurrency: Memory Consistency Dr. Michael Petter Winter term 2015 # **Need for Concurrency** - in 1997 the *Pentium P55C* had 4.5M transistors - in 2006 the Itanium 2 had 1700M transistors - --- Intel could have built a processor with 256 Pentium cores in 2006 ory Consistency 1/53 Memory Consistency Motivation ## **Need for Concurrency** Consider two processors: - in 1997 the *Pentium P55C* had 4.5M transistors - in 2006 the Itanium 2 had 1700M transistors - → Intel could have built a processor with 256 Pentium cores in 2006 - most programs are not inherently parallel - parallelizing a program is between difficult and impossible - correctly communicating between different cores is challenging - --- correctness of concurrent communication is very hard - low-level aspects: locking algorithms must be correct - high-level aspects: program may deadlock - a program on n cores runs $m \ll n$ times faster - → all effort is voided if program runs no faster - distributing work load is application specific ### The free lunch is over Single processors cannot be made much faster due to physical limitations. Source: D. Patterson, UC-Berkeley ### The free lunch is over Single processors cannot be made much faster due to physical limitations. Source: D. Patterson, UC-Berkeley But Moore's law still holds for the number of transistors: - they double every 18 months for the foreseeable future - may translate into doubling the number of cores programs have to become parallel # **Concurrency for the Programmer** How is concurrency exposed in a programming language? - spawning of new concurrent computations - communication between threads ## **Concurrency for the Programmer** - spawning of new concurrent computations - communication between threads Communication can happen in many ways: - communication via shared memory (this lecture) - atomic transactions on shared memory - message passing ### **Learning Outcomes** - Happened-before Partial Order - Sequential Consistency - The MESI Cache Model - Weak Consistency - Memory Barriers Memory Consistence Motivation 4 / 53 ### **Communication between Cores** We consider the concurrent execution of these functions: # Thread A ### Thread B ``` void foo(void) { a = 1: b = 1; } void bar(void) { while (b == 0) assert (a == 1) } ``` - initial state of a and b is 0 - A writes a before it writes b - B should see b go to 1 before executing the assert statement - the assert statement should always hold - here the code is *correct* if the assert holds b → correctness means: writing a 1 to a happens before reading a 1 in b #### **Definition (Strict consistency)** Read operations from location l return values, written by the most recent write operation to l. Memory Consistence emory Consistency 5/5 # **Strict Consistency** # **Strict Consistency** Assuming foo and bar are started on two cores operating in lock-step. Then *one* of the following may happen: Memory Consistency Memory Consistency CIEO Innania Canadatanan emory Consistency 6/5 ## **Strict Consistency** Assuming foo and bar are started on two cores operating in lock-step. Then *one* of the following may happen: A unique order between memory accesses is unrealistic in reality: - each conditional (and loop iteration) doubles the number of possible lock-step executions - processors use caches → lock-step assumption is violated since cache behavior depends on data 6/53 ## **Strict Consistency** Assuming foo and bar are started on two cores operating in lock-step. Then *one* of the following may happen: A unique order between memory accesses is unrealistic in reality: - each conditional (and loop iteration) doubles the number of possible lock-step executions - processors use caches → lock-step assumption is violated since cache behavior depends on data → strict consistency is too strong to be realistic ldea: state correctness in terms of what event *may* happen before another one метс emory Consistency 6 / 5 #### onsistency memory consistency # **Events in a Distributed System** A process as a series of events [Lam78]: Given a distributed system of processes $P, Q, R, \ldots$ , each process P consists of events $\bullet p_1, \bullet p_2, \ldots$ # The Happend-Before Relation in Distributed Systems mory Consistency Happened-Before Relation Memory Consistency lappened-Before Relation 8/5 # **Events in a Distributed System** A process as a series of events [Lam78]: Given a distributed system of processes $P, Q, R, \ldots$ , each process P consists of events $\bullet p_1, \bullet p_2, \ldots$ Example: - event • $p_i$ in process P happened before • $p_{i+1}$ - if • $p_i$ is an event that sends a message to Q then there is some event • $q_i$ in Q that receives this message and $\bullet p_i$ happened before $\bullet q_i$ # **Excursion:** Wandlore (I) Events in time are like power of wands: # **Excursion:** Wandlore (I) Events in time are like power of wands: # **Excursion:** Wandlore (I) Events in time are like power of wands: # **The Happened-Before Relation** #### **Definition** If an event p happened before an event q then $p \rightarrow q$ . #### Observe: - $\rightarrow$ is partial (neither $p \rightarrow q$ or $q \rightarrow p$ may hold) - $\bullet$ $\rightarrow$ is irreflexive $(p \rightarrow p \text{ never holds})$ - $\bullet$ is transitive $(p \rightarrow q \land q \rightarrow r \text{ then } p \rightarrow r)$ - $\bullet$ is asymmetric (if $p \to q$ then $\neg (q \to p)$ ) - $\rightsquigarrow$ the $\rightarrow$ relation is a *strict partial order* Note: a strict partial order $\prec$ differs from a (non-strict) partial order $\preceq$ due to: | strict partial order | non-strict partial order | |----------------------------------------|-------------------------------------------------| | irreflexive $\neg (p \prec p)$ | reflexive $p \leq p$ | | asymmetric | antisymmetric | | $p \prec q$ implies $\neg (q \prec p)$ | $p \preceq q \land q \preceq p$ implies $p = q$ | Memory Consistency Happened-Before Relation 44 / 51 # **Concurrency in Process Diagrams** Let $a \not\rightarrow b$ abbreviate $\neg (a \rightarrow b)$ . ## Definition Two distinct events p and q are said to be *concurrent* if $p \not\to q$ and $q \not\to p$ . - $p_1 \rightarrow r_4$ in the example - $p_3$ and $q_3$ are, in fact, concurrent since $p_3 \not\rightarrow q_3$ and $q_3 \not\rightarrow p_3$ Memory Consistency tappened-Before Relation 40 / 50 # **Ordering** #### **Definition (Clock Condition)** Function C satisfies the *clock condition* if for any events p, q # **Ordering** Let ${\it C}$ be a ${\it logical clock}$ that assigns a time-stamp ${\it C}(p)$ to each event p. ### **Definition (Clock Condition)** Function C satisfies the *clock condition* if for any events p, q $$p \to q \implies C(p) < C(q)$$ For a distributed system the *clock condition* holds iff: - **1** $p_i$ and $p_j$ are events of P and $p_i \rightarrow p_j$ then $C(p_i) < C(p_j)$ - ② p is the sending of a message by process P and q is the reception of this message by process Q then C(p) < C(q) # **Ordering** Let C be a *logical clock* that assigns a time-stamp C(p) to each event p. #### **Definition (Clock Condition)** Function C satisfies the *clock condition* if for any events p, q $$p \rightarrow q \implies C(p) < C(q)$$ For a distributed system the *clock condition* holds iff: - $p_i$ and $p_j$ are events of P and $p_i \rightarrow p_j$ then $C(p_i) < C(p_j)$ - ② p is the sending of a message by process P and q is the reception of this message by process Q then C(p) < C(q) $\rightarrow$ a logical clock *C* that satisfies the clock condition describes a *total order* a < b (with C(a) < C(b)) that *embeds* the strict partial order $\rightarrow$ Memory Consistence Happened-Before Relation 12 / 5 ## **Ordering** Let C be a *logical clock* that assigns a time-stamp C(p) to each event p. #### **Definition (Clock Condition)** Function C satisfies the *clock condition* if for any events p, q $$p \to q \implies C(p) < C(q)$$ For a distributed system the *clock condition* holds iff: - $\bigcirc$ $p_i$ and $p_j$ are events of P and $p_i \rightarrow p_j$ then $C(p_i) < C(p_j)$ - ② p is the sending of a message by process P and q is the reception of this message by process Q then C(p) < C(q) $\rightarrow$ a logical clock C that satisfies the clock condition describes a *total order* a < b (with C(a) < C(b)) that *embeds* the strict partial order $\rightarrow$ The *set* defined by all *C* that satisfy the clock condition is exactly the *set* of executions possible in the system. ightsquigarrow use the process model and ightsquigarrow to define better consistency model Memory Consistency Happened-Before Relation 12 / 52 # **Defining** C Satisfying the Clock Condition Given: # **Defining** C **Satisfying the Clock Condition** Given: Memory Consistency Happened-Before Relation 14 / 53 # **Summing up Happened-Before Relations** We can model concurrency using processes and events: - there is a *happened-before* relation between the events of each process - there is a *happened-before* relation between communicating events - happened-before is a strict partial order - a clock is a total strict order that embeds the *happened-before* partial order Sequential Consistency on Multi-Processor Machines # **Moving Away from Strict Consistency** Idea: use process diagrams to model more relaxed memory models. Given a path through each of the threads of a program: - consider the actions of each thread as events of a process - use more processes to model memory - ▶ here: one process per variable in memory - concisely represent some interleavings # **Definition: Sequential Consistency** **Definition (Sequential Consistency Condition [Lam78])** The result of any execution is the same as if - the operations of all the processors were executed in some sequential order and - the operations of each individual processor appear in this sequence in the order specified by its program. Sequential Consistency applied to Multiprocessor Programs: Given a program with n threads, - $\bullet$ for fixed operation sequences $p_0^1, p_1^1, \ldots$ and $p_0^2, p_1^2, \ldots$ and $p_0^n, p_1^n, \ldots$ keeping the program order - 2 executions obey the clock condition on the $p_i^i$ , - all executions have the same result Yet, in other words: - • defines the execution path of each thread - each execution mentioned in 2 is one *interleaving* of processes - 1 declares that the result of running the threads with these interleavings is always the same. ## **Disproving Sequential Consistency** Sequential Consistency in Multiprocessor Programs: Given a program with n threads, - $\bullet$ for fixed operation sequences $p_0^1, p_1^1, \ldots$ and $p_0^2, p_1^2, \ldots$ and $p_0^n, p_1^n, \ldots$ keeping the program order - 2 executions obey the clock condition on the $p_i^i$ , - all executions have the same result Idea for showing that a system is *not* sequentially consistent: - pick a result obtained from a program run on a SC system - pick an execution and a total ordering of all operations - add extra processes to model other system components - the original order ② becomes a partial order → - show that total orderings C' exist for $\rightarrow$ for which the result differs # Weakening the Model There is no observable change if calculations on different memory locations can happen in parallel. Idea: model each memory location as a different process Sequential consistency still obeyed: - the accesses of foo to a occurs before b - the first two read accesses to b are in parallel to a=1 # **Benefits of Sequential Consistency** Benefits of the sequential consistency model: - concisely represent all interleavings that are due to variations in speed - synchronization using time is uncommon for software - a good model for correct behaviors of concurrent programs - programs results besides SC results are undesirable (they contain races) # **Benefits of Sequential Consistency** Benefits of the sequential consistency model: - concisely represent all interleavings that are due to variations in speed - synchronization using time is uncommon for software - a good model for correct behaviors of concurrent programs - programs results besides SC results are undesirable (they contain races) It is a realistic model for older hardware: - sequential consistency model suitable for concurrent processors that acquire exclusive access to memory - processors can speed up computation by using caches and still maintain sequential consistency # **Benefits of Sequential Consistency** Benefits of the sequential consistency model: - concisely represent all interleavings that are due to variations in speed - synchronization using time is uncommon for software - → a good model for correct behaviors of concurrent programs - → programs results besides SC results are undesirable (they contain *races*) It is a realistic model for older hardware: - sequential consistency model suitable for concurrent processors that acquire exclusive access to memory - processors can speed up computation by using caches and still maintain sequential consistency Not a realistic model for modern hardware with out-of-order execution: - what other processors see is determined by complex optimizations to caching - → need to understand how caches work Introducing Caches: The MESI Protocol ry Consistency The MESI Protocol