Uncategorized

In the previous lecture we have looked at the basic properties of transactions namely the acid properties atomicity, concurrency, isolation and durability. In today’s lecture we are going to see how these properties will be realized by the transaction processing system within the database manager. We will take a few simple examples and through this examples we will illustrate how the transaction processing system will ensure the acid properties of the transaction Here is a very simple example shown in the slides here. There are two transactions here which are shown T1 and T2. T1 is a account transfer transaction, transfer of money from one account to other account. Now 1000 rupees had been transferred from account A to account B by transaction T1, transaction two is an interest payment transaction. So it is actually crediting into each account a 5 % interest into each of the accounts Now what is shown here is this T1 and T2 operating simultaneously on the banking system. Now what we will do is we will try to understand these two transactions in terms of the various operations performed by these transactions What I am going to do here is I will write T1 as performing several transactions or several operations. Now the first operation that is performed by T1 is to actually take the account A and read the value of the balance that is there in this account. So it is basically a read operation of the account. The second operation is essentially to add 1000 rupees into this account and the third operation is going to be writing the value back. The same thing is going to be done for the account B. So I will actually show that as the fourth operation but the subsequent operations will not be shown here They are self-explanatory one can understand after that, the other operations. Now the suffix here shows that this is the first operation and this gives the transaction id. One is the transaction id and the one here indicates that it is the first operation. So operation 1 1 indicates that this is the first instruction of transaction. This indicates that it is the second instruction of transaction 1 like that it is shown here Now if you basically take a operation Oi j, it indicates that this is j th operation of i the transaction. This is the notation that we will be using. Now as you can see here

this is basically a read operation on A and this is basically a write operation on A So we have between the processing we have the reads and the writes happening on the data items. Now we can also understand the transaction T2 also has shown in the slide as trying to do the following operations O21 is going to be a read of A and then the O22 operation is going to be an update on the value of the data item and then O22 O23 is going to be an item again A the rest of the operations as shown in the earlier case for the B, operation B Now what we are going to show you is what happens when these transactions are executed simultaneously on the database. Now a list of actions form a set of transactions as seen by the dbms. As you can see here O11 O12 O13 O14 are set of transaction, set of operations constituting T1. Similarly we have O21 O22 O23 constituting the set of instructions constituting transaction T2 Now it is possible for these operations to get interleaved in the sense that it is possible for these transactions, operations of the transactions to execute in an interleaved fashion. Now when this gets executed in an interleaved fashion, we basically call that as schedule. A schedule is nothing but a series of operations as executed by the database management system Now you can see here it is possible for these two transactions to execute concurrently, a set of interleavings that were shown is the operation T1 are constituting A equals to A plus hundred is executed here. Then T2 is executed which is equivalent to saying A equals to this operation is executed here, followed by the T2 of B then T1 again B equals to B minus. This is one schedule which is called a possible schedule, we can call this is as SC1 as one possible schedule. As you can see here these two constitute transaction T1 these two operations, these two operations constitute T2 Now there is another possible schedule also shown in the slide there in which case T1 and T2 are executed as shown here. But the other two operations are interchanged. This is executed before the other operation. Now when we actually have the schedules, one of the important criterion for this schedules to be valid is to see that these schedules produce proper consistent results at the end of the execution Now this schedule one can be seen as shown in the slide as read and the writes on the various data items. T1 is basically is reading A and then writing A. It is reading the old value of the bank account balance of account A and writing the new value for the balance here. T2 is also doing a read of A and write of A because you are computing the new value of A by calculating the interest that is payable for this account. Similar way we can also write for write B and then read B and write B and read B and then again write B. What

we are going to show you through this example is what happens when these reads and writes are interleaved from the consistency point of view Now we can see here the notion of what is correct from the execution point of view is shown here. Suddenly we don’t want transactions to execute one after the other because then the throughput of the system will come down drastically. You want as many operations as possible should be executed in a concurrent fashion to actually increase the throughput of the system. Now when concurrently executed transactions at the end of it whether they can be translated into what is called a serial execution Now in this particular case, what you would like to say is T1 executed completely after T2 this is one possibility or T2 executed after T1. As long as this is possible for you say, we call this kind of schedule as a serial schedule. And that is what is actually shown in the diagram here. A schedule is which is equivalent to one of the serial schedules is equivalent to saying that either T1 executed before T2 or T2 executed before T1 that is one of these should be possible. Now to see what really happens when the execution is not serial that means when finally you are not able to detect saying that the execution of the transactions is not as per the serial schedule I will take a very simple example and show how exactly will deduce the serial schedules Let us take a case of a transaction T1 with operation x and then 1 2 which is actually a write of y. Now we can take another transaction T2 where basically it is a read of y and then a write of x. Now no matter how these operations are executed. As long as it possible for you to say that all the operations at T1 have been executed before T2 which is equivalent to saying that if there is a schedule that says O11(x) and O12 (y) has actually finished before O21 (y) and O22 (x) are executed by the database manager. This is equivalent to saying that T1 finished before T2. This is what actually we mean by a serial execution of the transactions That is T1 finished execution before transaction T2 started executing. Now it is also possible for you to also have the reverse order where all the operations of T2 have been executed before T1. This is a very simple and straight forward case where we can easily save on all the operations of T1 and T2 are executed in this particular fashion very easy to see that T1 has actually finished all the operations before T2 stated executing. The only case where you will have problems is when some operations of T1 have been executed in such a way that they are interleaved with the execution of the operations of T2 Then the problem of deciding whether the schedule is equivalent to a serial schedule that all the operations of one transaction finish before the other becomes a important requirement And that is what actually we are going to look at how that can be done. In this particular case what we are going to say is all that will be required for us as a criterion whereas schedule is produced is two operations are said to conflict, we say the notion of a conflicting operation is when one of them is a write operation One of the operations

is a write operation Now to give you the little more simplistic view, let us say T1 is trying to read a data item x and T2 is actually trying to write the data item x. These two operations O1 of some i, O2 of some j are said to conflicting because they are operating on the same data item and they are conflicting with each other Here as you can see here, T1 is reading the data item x, T2 is writing the same data item x. Since both transactions are reading the same data item and one of the operations is a write operation, we say that these two operations are called conflicting Now whenever we have conflicting operations like this, the first inference is here O1 i and O2 j are conflicting. Now in transactions, if there is conflicting operations and there is a way this conflicting operations have been executed, let us say the conflicting operation in this particular case is executed is such a way that this is the order in a schedule. Now this order actually determines that T1 actually preceded T2 because it is conflicting on data item x and T1 has been executed before T2 And now this order should be preserved, no matter what happens with respect to other operations. And as long as you preserve that order with respect to all other operations, we say that the operations have been executed in a serial fashion or the schedule is reducible to a serial schedule. This is the concept of serializibilty. Now this is the important notion here is when transactions are executing concurrently. We need to ensure that the conflicting operations are serializable, all the conflicting operations are serializable Now here is a very simple case shown in the slide where it shows that where it is not possible to serialize, we need to actually abort the transaction. Now in this particular case it is shown that T1 actually is reading as we go to the earlier case, T¬1 is actually reading the data item A and then writing data item A. Now if you look at the T2 is also reading data item A and then writing data item A. Now the other part of T2 is read B and write B. Now as you can see here with respect to data item A, the order between T1 and T2 is T1 is before T2 Now if you look at the data item B, it is coming in the revere direction which actually means that as you can see here, on the data item B as far as the conflicting data item A is concerned, T1 is before T2, as far as B is concerned it is T2 before T1. As you can see this is on data item B and this is on data item A Now from this it is not possible for us to say whether T1 actually has finished before T2 or T2 has finished before T1. Since we can’t now decipher which one has actually finishing before the other, this schedule is non serializable schedule. And this is what should be avoided. A non serialzable schedule shows that the execution of the operations will lead to inconsistency. The database will be in an inconsistent state when we have the operations executed in a non serlizable way Now in this particular case it is shown here that T2 can commit but T1 has to abort. That’s

Now when there is a conflict between the two operations, whether it is a read write or a write write conflict, you need to actually serialize the operations by actually saying that they are executed in a serial order which is what we mean by conflict serializablitiy Now whenever there is a conflicting operations, we need to actually serialize the two operations which is known as the conflict serializability In this particular case oi p (x) and oj q (x) are the conflicting operations and they need to be serialized in a particular fashion and this is what we mean by conflict serializability We are going to see in later lectures, how the transactions are executed by the transaction manager to ensure that conflicting operations are serialized or serial schedules are produced by the transaction manager One of the simple technique that is used is what we see as a two phase locking and we are going to study that two phase locking as a technique for achieving conflict serailizability later in our lectures. Now here is actually what is shown as how exactly the transaction manager achieves some of the properties that we have been discussing The dbms ensure that a transaction either completes and its results are permanently written. This is what we mean by committing a transaction or no effect at all on the database, this is equivalent to saying that the transaction has been aborted. So we have two states for the transaction, either a commit state or an abort state. In the case of a commit state, all the operations of the transaction or executed in full and then they are committed. In the case of abort, no effect at all on the database as far as that transaction is concerned. Now the idea of transaction manager is it controls the execution of the transactions. As we saw in this particular case, it controls the execution of the transactions in such a way that the operations of the transactions are serializable Similarly if you take the recovery manager, recovery manager is responsible for undoing the actions of transaction which do not commit This is a equivalent to saying that the recovery manger is responsible for ensuring the property of atomicity. All actions of the committed transactions survive any kind of hardware or software failures. This is actually known as writing the committed transactions on to a stable storage. What we are going to do is we are going to look at little further into how the recovery manager ensures that properties of atomicity and durability Now what are issues involved in ensuring atomicity and durability? The following errors can occur when a transaction is executing. First is it could relate to logical errors. For example you are trying to withdraw some money from a bank account, it is possible that the account itself doesn’t exist or the account doesn’t has sufficient funds. In all this cases, transaction cannot proceed any further. This is what we mean by logical errors. The transaction may have to abort because of logical errors. There could be system errors. For example it is possible that there are problems of network, there are problems of system failures, temporary failure or power failure in which case when the power comes back, you need to know what really happened for your transaction with respect to already started transactions A simple example could be, you go to a atm and try to withdraw money from the atm and the power, when you actually press the button for withdrawing the money, the atm stops functioning atm failed due to various reasons. You would like to know whether their system is actually debited the amount from the bank, from your balance or not. That is basically system errors There could be crash, system crashes, there could be a hard disk failure, the disk head

could have been corrupted. So there could be various reasons why the system didn’t’ perform, it could be a system crash. So all these errors are possible When the system actually goes into any of these errors, you want to understand how exactly the atomicity and the durability properties can be maintained. A simple example trying to illustrate this point will be something like a file which all of us open on a windows machine or any of our unix machines. Now here when you open a file in a editor mode and tries to edit your file, there is no guarantee in terms of what happens when a power fails because the file could be in a very corrupted state. there is no guarantee for you in terms of the state of the file which all of us know we keep repeatedly saving the file, when we actually entering or writing some document, we try actually saving the document as many times as possible, so that when the power goes off or something else happens we still save the portion of the work we have actually done We don’t lose the file because of power failure all the work that we have done. Now the same thing cannot happen in the case of database systems because here the more critical data that is been in saved in the file. So we need to ensure that whatever happens when any of these failures happen, the system is still in a predictable state. That is the difference between ordinary file systems implemented by an operating system and a database implemented by commercial systems. They ensure that whenever these things happen, still the properties, the save properties for the transaction, the acid properties of the transactions are retained To explain how these properties are retained by the system, we need to also understand the different storage types that are available in a computer system. A simple volatile storage, we basically look at a simple volatile storage, this does not survive system crashes that means when the system actually crashes the storage is lost, the storage is volatile it is lost the minute the system crashes. When you talk about non volatile storage, the system actually survives these crashes. That means the storage is the, whatever you write into the storage is not lost when the system crash occurs A simple case is whatever is there in the main memory is lost when the power goes whereas if you have written it onto your hard disk, it survives a power failure because it is written into a more non volatile storage Now we also have a concept of a stable storage which is an abstraction of maintaining replicated copies in multiple non volatile storage media, so that whenever higher disasters occur we still have a way of getting our data back and that is we mean by the concept of a stable storage. Now what we are going to see is how this concepts are used for actually achieving the atomicity and durability properties in the transaction manager What we are going to show here in this particular case is what really happens when transactions have to roll back. The rolling of the transaction has to happen mainly because of logical errors or the system crashes and hence it has to be restored back to a previous state. Transaction abort requires roll back which means undoing all the actions of that particular transaction Now to ensure that roll back of the transaction occurs properly, what we have to do is all the writes of the transaction have to be properly recorded in what is called as a log file The log file retains all the information relating to the writes of the transaction and this will be used when the roll back has to occur Now if the system crashes, all active transactions during the crash need to be aborted when the system comes back. This is equivalent to saying that they will all be rolled back and the information that is there in the system in

the log file will be used to properly undo the transaction activities, whatever the transactions are being doing. What we are going to do is again in this particular case, we will take a very simple example and see how exactly this happens Now here is the case where case where the logs are maintained and how this logs are helpful in ensuring the atomicity property or the how the atomicity property will be realized by the database manager. What is shown here is a simple case of writing the logs before the transactions starts executing and making sure that it is carried over whenever the data item is being written or a new value for the data item is being written What you can see here is the first log that will be written as far as the transaction T one is concerned is what is called is the begin log, transacting begin log. Now what we have is basically a relating to a transaction, every transaction is preceded by the begin transaction as a keyword. Now this begin transaction actually tells the database manager to write what we see as a log, this is the transaction begin log Now this is a, the begin log has to be written onto the transaction. Now in between the transaction does various operations. As we have seen their, it’s possible that there are several operations which are done by the transaction in between and then we have and end transaction. Now this is actually the last instruction that is executed by this transaction, so we will have a what is called a commit log indicating that the transaction has actually committed which is equivalent to saying that all these operation have been successfully executed So between begin and end, at any point of time when there is a crash, we need to actually recover back to the starting point. And this is what we mean by actually roll back. What we are calling as a roll back is basically rolling back all the things that a transaction is done to the beginning. This is what is meant as a roll back. Now what we do is we actually ensure that whatever the transaction is doing, is written on to this log and this log will be used for rolling back the transaction whenever a crash occurs For example you can see that in the slide it is shown that T i writes a data item. Now as you can see the first one transaction starts, there is a transaction log T i start. Now the second thing that you notice is T i writes a data item. now there is a old value and a new value old value is the old value of the data item and the new value is the new value of the data item x. so there is a log that is written there which shows that T i x old value and a new value is shown here, this is how actually the log is written Whenever there is a change in the data item, we basically write the log and now this log shows what was the old value and what is the new value now. When you come to the last transaction, basically you have a commit log that is a T i commit log. So as shown here, we have a begin log and a commit log and in between whatever is happening is being recorded there as shown in the slide there. So one of the things that we are going to look at now is how this logs can be used for recovery purposes How exactly this logs can be used by the database manager to ensure that whenever those kind of failures that we are talking earlier occurs how the system will recover back from those failures Now one of the things when writing this logs is one of the things that one should remember is the id’s of the transactions are stored So that we know to this logs pertain to which

transactions, so transaction id’s are appended when the logs records to identify the transactions for which this have logs been produced. Logs are replicated and stored in a stable storage This is also very important because in the logs themselves are lost then there is no way you can recover back Logs only assuming that the logs are written on to a stable storage, you can ensure that the transaction can be made to recover. But if the logs themselves are subjected to failure then you will not be able to recover back and hence logs are replicated, one of the assumption we make it logs are replicated and they are stored in a stable storage. So when we say a log is written, we assume that the values relating to the log have been written on a stable storage and it is possible for us to recover this information at any point of time Now as you can see here we are also showing how this log entries will also be ordered, ordering of the log entries. We say a transaction T are can commit only if the log entry relating to that is saved on a stable storage. This is equivalent to saying as you can see here, when you write this commit log this commit log is actually written on to a stable storage then we say the transaction is committed This is the point where it is possible for the transaction now to say that it is committed Now before this is actually written, this log is written all the other entries before this pertaining to this transaction should have also been written onto the stable storage You should never write the commit log before all the other log entries relating to this transaction have been saved on the stable storage. Now only after the writing the entries relating to the logs, you should write the data items themselves after this point onto the stable storage. This is very important, these steps are very important because if you perform them in any other order, you will have problems in terms of recovering back First requirement is all the log entries relating to this transaction should have written onto the stable storage in the first instance, before you are writing the commit log. Only after writing the commit log, the data item values pertaining to the transaction can themselves can be written onto the stable storage The reason for this is simple. If you don’t write the log values first on to the stable storage, there is no way if something happens to recover from that particular failure. for example if you have written the data value onto the stable storage, now something happens there is now way of finding out what is the state in which the transaction is unless the logs are written properly. So logs are the bases for the database manager to find out what is the state in which the transaction is when a failure is occurred. And hence it is important for you to first write all the logs relating to the transaction then write the commit log and then write all the data items onto the stable storage. This is how one needs to order or write the various things relating to the transaction We will take a very simple example. A simple example in this case to see how the two transactions can really execute writing their logs. In this particular case, it is shown T one and T two T one is actually reading certain data items and writing certain data items. Similarly T two is also reading and writing certain data items. For completeness sake, we also have shown the initial values that are there in the database when this transactions T one and T two start executing As can be seen in the slide, the initial values of A are A is 100, B is 300, c is 5, D is 60 and E is 80. Now T1 when starts executing, it is going to read the values of A will increment