Normalization is better for OLTP (On-line Transaction Processing) systems:
- Quick updates – data is in one place, no duplication
- Quick inserts - data is in one place, no duplication
Denormalization is better for OLAP (On-line Analytical Processing) systems:
- Quick reports – no need for joins
- Basic Availability - The database appears to work most of the time.
- Soft-state - Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
- Eventual consistency - Stores exhibit consistency at some later point (e.g., lazily at read time).
The CAP theorem, also known as Brewer's theorem, states that a distributed database system can only simultaneously provide two out of the following three guarantees:
- Consistency: Every read receives the most recent write or an error.
- Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.
In essence, the theorem posits that in the presence of a network partition, one must choose between consistency and availability. No distributed system can guarantee all three elements at the same time.
- Atomicity - either all of the pieces of information are committed or none are
- Consistency - A transaction either creates a new and valid state of data, or returns all data to its state before the transaction was started.
- Isolation - A transaction in process and not yet committed must remain isolated from any other transaction.
- Durability - Committed data is saved by the system such that, even in the event of a failure and system restart, the data is available in its correct state.
- With a clustered index the rows are stored physically on the disk in the same order as the index. There can therefore be only one clustered index.
- With a non-clustered index there is a second list that has pointers to the physical rows. You can have many non-clustered indexes, although each new index will increase the time it takes to write new records.
- It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.
- Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.
- A primary key also implies NOT NULL, but a unique index can be nullable.
- There can be only one primary key, but there can be multiple unique indexes.
- If there is no clustered index defined, then the primary key will be the clustered index.
- Dirty Reads - A dirty read occurs when a transaction reads data that has not yet been committed. For example, suppose transaction 1 updates a row. Transaction 2 reads the updated row before transaction 1 commits the update. If transaction 1 rolls back the change, transaction 2 will have read data that is considered never to have existed.
- Non-repeatable - Reads A non-repeatable read occurs when a transaction reads the same row twice but gets different data each time. For example, suppose transaction 1 reads a row. Transaction 2 updates or deletes that row and commits the update or delete. If transaction 1 rereads the row, it retrieves different row values or discovers that the row has been deleted.
- Phantoms - A phantom is a row that matches the search criteria but is not initially seen. For example, suppose transaction 1 reads a set of rows that satisfy some search criteria. Transaction 2 generates a new row (through either an update or an insert) that matches the search criteria for transaction 1. If transaction 1 reexecutes the statement that reads the rows, it gets a different set of rows.
- Read Uncommitted – Read Uncommitted is the lowest isolation level. In this level, one transaction may read not yet commited changes made by other transaction, thereby allowing dirty reads. In this level, transactions are not isolated from each other.
- Read Committed (default for MSSQL, PostgreSQL) – This isolation level guarantees that any data read is committed at the moment it is read. Thus it does not allows dirty read. The transaction hold a read or write lock on the current row, and thus prevent other rows from reading, updating or deleting it.
- Repeatable Read (default fro MySQL, MariaDB) – This is the most restrictive isolation level. The transaction holds read locks on all rows it references and write locks on all rows it inserts, updates, or deletes. Since other transaction cannot read, update or delete these rows, consequently it avoids non repeatable read.
- Serializable – This is the Highest isolation level. A serializable execution is guaranteed to be serializable. Serializable execution is defined to be an execution of operations in which concurrently ececuting transactions appears to be serially executing.
Dirty reads | Nonrepeatable reads | Phantoms | |
---|---|---|---|
Read uncommitted | Yes | Yes | Yes |
Read committed | No | Yes | Yes |
Repeatable read | No | No | Yes |
Serializable | No | No | No |
A type of an injection attack where the attacker executes a malicious batch of SQL inside our database server.
- NEVER, EVER execute a raw SQL query
- Sanitize your query manually (But srly, we are not in the 90s – use something else).
- User parametrized queries
- Use an ORM
COALESCE
will take any number of parameters, and return the first value encountered that isn't NULL
.
DELETE
is a DML Command.DELETE
statement is executed using a row lock, each row in the table is locked for deletion.- We can specify filters in where clause
- It deletes specified data if where condition exists.
- Delete activates a trigger because the operation are logged individually.
- Slower than truncate because, it keeps logs.
- Rollback is possible.
TRUNCATE
is a DDL command.TRUNCATE TABLE
always locks the table and page but not each row.- Cannot use Where Condition.
- It Removes all the data.
TRUNCATE TABLE
cannot activate a trigger because the operation does not log individual row deletions.- Faster in performance wise, because it doesn't keep any logs.
- Rollback is not possible.
The basic difference between UNION and UNION ALL is union operation eliminates the duplicated rows from the result set but union all returns all rows after joining.
HAVING
is used to check conditions after the aggregation takes place.
WHERE
is used to check conditions before the aggregation takes place.
- Procedure can return zero or n values whereas function can return one value which is >mandatory.
- Procedures can have input/output parameters for it whereas functions can have only input >parameters.
- Procedure allows select as well as DML statement in it whereas function allows only >select statement in it.
- Functions can be called from procedure whereas procedures cannot be called from function.
- Exception can be handled by try-catch block in a procedure whereas try-catch block >cannot be used in a function.
- We can go for transaction management in procedure whereas we can't go in function.
- Procedures can not be utilized in a select statement whereas function can be embedded in >a select statement.
- UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section where as Stored procedures cannot be.
- UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables.
- Inline UDF's can be though of as views that take parameters and can be used in JOINs and other Rowset operations.
- Stored procedures are stored in database in complied form.
- Dynamic SQL statements are dynamically constructed at run time.
They are the same.
No, triggers cannot be called manually.
Stored procedures written in a CLR compatible language like C#, VB.NET (How about no.), F#, etc.
- If your logic mostly includes transformations of massive sets of data, which can be performed using set operations, then use T-SQL.
- If your logic mostly includes complex computations of relatively small amounts of data, use CLR.
- It’s generally a better idea to use the
EXISTS
function over theCOUNT
. EXIST
is a short circuiting operator, meaning it stops execution of the query after a single result is fount.COUNT
on the other hand, check all the rows that satisfy the conditions of the query.
- Correlated Subquery is a sub-query that uses values from the outer query. In this case the inner query has to be executed for every row of outer query.
- Simple subquery doesn't use values from the outer query and is being calculated only once.
- Use uncorrelated subqueries when possible – better performance.
- During compilation, the value passed as the store procedure parameter is evaluated and used to create an execution plan.
- That value is also stored with the execution plan in the plan cache.
- Future executions of the plan will re-use the plan that was compiled with that reference value.
- Recompiling
-- stored procedure version
CREATE PROCEDURE dbo.sample @ProductID INT
WITH RECOMPILE
AS
/* do something */
-- single statement version
SELECT * FROM Sales.SalesOrderHeader
OPTION (RECOMPILE);
- Optimize for value
ALTER PROCEDURE Get_OrderID_OrderQty
@ProductID INT
AS
SELECT SalesOrderDetailID, OrderQty
FROM Sales.SalesOrderDetail
WHERE ProductID = @ProductID
OPTION (OPTIMIZE FOR (@ProductID=945));
- Optimize for unknown
CREATE PROCEDURE dbo.GetCities
@StateProvinceID int
AS
SELECT DISTINCT City
FROM Person.Address
WHERE StateProvinceID=@StateProvinceID
OPTION (OPTIMIZE FOR UNKNOWN)
- Exceptions - Just use branching logic a.k.a. IF/ELSE
- Clustered Index Scan – O(n)
- Index Scan – O(n)
- Index Seek – O(log(n))
- Key Lookup – O(1)
- Nested Loops – O (n*m)
- Merge Join – O (n + m)
- Hash Join – O (n + m)
- Use appropriate Indices (clustered, non-clustered)
- Inspect your WHERE clauses, when performance is an issue
- Denormalizaton vs many joins?
- Do not overuse indices
- Consider if you need fast SELECT or fast INSERTs/UPDATEs – indices slow down the latter.
- If you can – always prefer using set based operations over cursors