TOC o “1-3” h z “Title

TOC o “1-3” h z “Title,4,Subtitle,5” File Formats for data storage PAGEREF _Toc525054661 h 3Text / ASCII / CSV File PAGEREF _Toc525054662 h 3File Types PAGEREF _Toc525054663 h 4Operating system and file-types PAGEREF _Toc525054664 h 6CSV File format PAGEREF _Toc525054665 h 8Data transfer format – Text / ASCII / CSV File PAGEREF _Toc525054666 h 9W3C standardization for CSV PAGEREF _Toc525054667 h 9CSV or Text file as database PAGEREF _Toc525054668 h 10Programing languages support for ASCII PAGEREF _Toc525054669 h 11Tools for converting spatial / non-spatial to csv PAGEREF _Toc525054670 h 12Python library for csv PAGEREF _Toc525054671 h 12Csvkit tool backed by Github PAGEREF _Toc525054672 h 12Text database – q PAGEREF _Toc525054673 h 13Linux column utility as csv viewer PAGEREF _Toc525054674 h 14CSV in MYSQL PAGEREF _Toc525054675 h 14Managing data using Database’s PAGEREF _Toc525054676 h 16Databases – Understanding the DB-World PAGEREF _Toc525054677 h 16Relational operations PAGEREF _Toc525054678 h 20Key constraints and functional dependencies PAGEREF _Toc525054679 h 21Superkey PAGEREF _Toc525054680 h 22Candidate key PAGEREF _Toc525054681 h 22Functional dependency PAGEREF _Toc525054682 h 22Trivial functional dependency PAGEREF _Toc525054683 h 23Closure PAGEREF _Toc525054684 h 23Completion PAGEREF _Toc525054685 h 24Irreducible cover PAGEREF _Toc525054686 h 25Algorithm to derive candidate keys from functional dependencies PAGEREF _Toc525054687 h 25Data Model – Overview PAGEREF _Toc525054688 h 27Hierarchical model PAGEREF _Toc525054689 h 27Network model PAGEREF _Toc525054690 h 28Relational model PAGEREF _Toc525054691 h 30DBMS – as software application PAGEREF _Toc525054692 h 33General-purpose and special-purpose DBMSs PAGEREF _Toc525054693 h 35History of DBMS PAGEREF _Toc525054694 h 36The relational model PAGEREF _Toc525054695 h 36Object databases PAGEREF _Toc525054696 h 372000s, NoSQL and NewSQL PAGEREF _Toc525054697 h 37Design and modeling PAGEREF _Toc525054698 h 45External, conceptual, and internal views PAGEREF _Toc525054699 h 47Database Languages classification PAGEREF _Toc525054700 h 48Performance, security, and availability PAGEREF _Toc525054701 h 50Storage PAGEREF _Toc525054702 h 50Materialized views PAGEREF _Toc525054703 h 51Replication PAGEREF _Toc525054704 h 52Security PAGEREF _Toc525054705 h 52Transactions and concurrency PAGEREF _Toc525054706 h 54Migration PAGEREF _Toc525054707 h 55Building, maintaining, and tuning PAGEREF _Toc525054708 h 55Backup and restore PAGEREF _Toc525054709 h 56Other DBMS features PAGEREF _Toc525054710 h 57List of some of the database products with license and stable version information: PAGEREF _Toc525054711 h 57Database vendors – Market share PAGEREF _Toc525054712 h 64Advantages, features of databases: PAGEREF _Toc525054713 h 64Operating System – Introduction PAGEREF _Toc525054714 h 67Functions performed by an Operating System PAGEREF _Toc525054715 h 711.Operating System as a Resource Manager PAGEREF _Toc525054716 h 712.Storage Management PAGEREF _Toc525054717 h 72
File Formats for data storageComputer world is full of data which is set of information stored as file. Computer file is standard encoding of storage. Broadly, files are of two main categories; text files and binary files. Computer file system specifies how the bits are used to encode the set of information in storage system in media/medium, which can be either free or proprietary file system storage. Some of the file systems are used for and designed specifically for particular type of data or usage; like PNG files. 28
Text / ASCII / CSV FileText file format also known as flat files or ASCII format commonly sometimes, can contain any stream of characters which can include control characters / non-printable character set which does not have any representable symbol. File like html, svg, source code of languages, csv files, logs from databases and unix/linux/windows are all good example of flat files/ASCII files/Text files.

In the case of text file types one can open the file in any editor and can read the contents, be it windows or linux that means using notepad, wordpad, MS-Word, VIM editor, or DOS command like type or cat in linux can display ascii character set or more precisely say flat file. The downside is that there is no security or encryption of data. Also every character is supposed to occupy one byte of space including spaces. So, in that sense a number like 900900.00 will be of 9 bytes storage.

Many a times different types of file extension’s indicate the type of the program the file is associated with and gives us clear indication about its working and contents; like .docx, .docm, xlsm etc are assocated with ms-office word and excel program. These files are binary in nature and associated with particular type of programs.

It is very important for us to understand the file format types, specially the flat format or ASCII as our research use these types of file system for data reading and operations based on these data store formats.

A text file which is also commonly known as “textfile” or referred to as “flatfile” alternatively, is a type of computer file format that is structured as a sequence of lines of electronic text (sequence of character sets). A text file ends with some one or more special characters, known as an end-of-file marker or character. Such markers were required under the MS-DOS operating systems but in modern day operating systems such as Windows OS and Unix-like systems, these text files do not contain any special characters.

“Text file” are like type of container, while plain text refers to a type of content. Text files can contain plain text, but they are not limited to such.

File TypesAt a very basic level / generic level file are of two types depending upon the data contents they contain, these are categorized as: text files and binary files 1. Text files are commonly used for storage of information commonly because of their simplicity. This avoids some of the problems with reference to other file formats encountered such as differences in the number of bytes like 2 bytes or 4 bytes of number storage. Further, it is easy to recover data in scenario where data is corrupted in a text file, to continue processing the remaining contents of the file. One of the disadvantage of the text files is that they tend to occupy more storage space.

In its simplest sense a simple text file may not need any additional metadata (other than the fact of knowledge of its char-set) to assist the reader in interpretation. A text file may be of zero byte size if it does not contain any data at all.

Among the most common compatible subset of character sets comes the ASCII character set which is used for English-language text files, and is generally / commonly assumed to be the default file format in many situations. It covers American English language words / characters, but as far as British Pound sign, the Euro sign, or characters used outside English language is concerned, a richer character set must be used. For the very same purpose mentioned, in many systems, this choice is based on the default locale setting on that computer. Before UTF-8, this was traditionally single-byte encodings such as ISO-8859-1 through ISO-8859-16 for European languages and wide character encodings for Asian languages.

Because many encodings have only a limited representation of characters, they are often only usable to represent text in a limited subset of human languages. Unicode is an attempt to create a common standard for representing all known languages, and most known character sets are subsets of the very large Unicode character set. UTF-8 is among the most commonly available encoding schemes, although there are multiple other character encodings available as well. The advantages of UTF-8 is of being backwards-compatible with ASCII characters; in that sense every ASCII text file is also a UTF-8 text file and has same identical meaning. Auto-detectable is another advantageous feature of UTF-8. Thus, most of the common operating mode of UTF-8 capable software’s, while opening files of any unknown encoding scheme, try UTF-8 first before fall back to a locale dependent legacy encoding in case it definitely isn’t UTF-8.

Operating system and file-typesOperating systems like MS-DOS and Windows use a common text file format, in which each line of text is separated by a 2-character combinations: line feed (LF) and carriage return (CR). It is common for the last line of text not to be terminated with a LF-CR marker, and many text editors (including Notepad) do not automatically insert one on the last line.

On Microsoft Windows operating systems, a file is regarded as a text file if the suffix of the name of the file (the “filename extension”) is “txt”. However, many other suffixes are used for text files with specific purposes. For example, source code for computer programs is usually kept in text files that have file name suffixes indicating the programming language in which the source is written.

Most Windows text files use “ANSI”, “OEM”, “Unicode” or “UTF-8” encoding. What Windows terminology calls “ANSI encodings” are usually single-byte ISO/IEC 8859 encodings (i.e. ANSI in the Microsoft Notepad menus is really “System Code Page”, non-Unicode, legacy encoding), except for in locales such as Chinese, Japanese and Korean that require double-byte character sets. ANSI encodings were traditionally used as default system locales within Windows, before the transition to Unicode. By contrast, OEM encodings, also known as DOS code pages, were defined by IBM for use in the original IBM PC text mode display system. They typically include graphical and line-drawing characters common in DOS applications. “Unicode”-encoded Windows text files contain text in UTF-16 Unicode Transformation Format. Such files normally begin with Byte Order Mark (BOM), which communicates the endianness (sequential order of bytes) of the file content. Although UTF-8 does not suffer from endianness problems, many Windows programs (i.e. Notepad) prepend the contents of UTF-8-encoded files with BOM,2 to differentiate UTF-8 encoding from other 8-bit encodings.3
On operating system like Unix or Linux text files format is precisely described: text file is defined in POSIX as a file that contains characters organized into 0’s or multiple lines,4 where lines are sequences of 0’s or more non-newline characters plus a terminating newline character,5 normally LF.

A printable file in POSIX is defined as a text file whose characters are printable or space or backspace as per the regional rules, exception to this is non-printable character-sets i.e. it excludes most control characters, which are not printable.6
An operating system supports ASCII, Text file formats and binary files format as well but with limitation or condition of that whether binary supported software has been installed and configured on that operating system or not. For text file support it might have internal or externally installed and configured editors or command’s (utilities).

Operating system can have multiple software’s or editors in the form of GUI (graphical user interface) or CLI (Command line interface) for editing and creating text based files, like VIM editor in linux. These editors can be in-build and shipped or can be external utilities either shipped or arranged and installed separately. These utilities can have different options or features which can make them suitable for set of working communities.

A lot depends on need, design and contents which reading or creating text based files like log files. The design and layout can make files usable or less usable in real sense apart from contents and all. We will discuss and see the same in coming chapters.

CSV File formatIn the world of text files, we have a special type of file format known as csv files, which comes under the category of delimited text file which use comma as separator. It usually contain data in the form of tabular format i.e. row and column combination where one row represents one record like tuple in database’s. In that sense, each file is data-set file comprise of many records of various information attributes.

The CSV file format not standardized as the field can contain literally comma as data and it also uses comma for data field separation as discussed. Such kind of situation are tricky and can be resolved by placing such data under quotes. Although quotation marks might not solve all sorts of problems, so one might need to use escape sequence back-slash character or so to deal with these kind of situations.

Instead of CSV format, which used comma as field separator, one can easily replace comma with some other character as delimiter. So, now in that sense everything remains same but CSV have different delimiter-separated format character and might be CSV name doesn’t go with the idea of having different delimiter character than comma, for example if the new separator is tab than it becomes tab-separated values or if the separator is space than it becomes space-separated values data file.

In scenarios when separator character is anything but these file often referred to as csv extension files despite the use of non-comma field separator, which might create problem in data exchange.

Data transfer format – Text / ASCII / CSV FileText files are also been used for data transfer format to remote servers which is commonly known as data information streams. This is apart from the common notion where flat files are used as data storage format in databases and CMS (Content management system format). But, in recent years for data transfer to remote servers XML files gained popularity as it not only contain data but describe it as well. Some industry specialist still feel the xml file size is of huge concern as xml files are big and bulky. The solution of this xml compression although; good example is mobile phone data using xml format.

W3C standardization for CSVW3C tries to standardize the CSV file by introducing set of recommendations for creating of tabular data.6 It was IBM Fortran which in 1972 first supported CSV, largely with fact of that it was easy to type CSV for lists on punched cards, that was the way Fortran coded, which is also less prone to produce errors. Around this only RDBMS were developed and CSV to SQL export / import becomes the way for transferring data to and from databases. Just like SQL is around and doing jobs just fine, its CSV also been around and continuing the good job. SQLizer and many other tools are there to convert CSV to SQL database format and vice-versa. Now a days there are many inbuilt utilities in databases for this only purpose where data can be imported or exported to and from the database to CSV or text format; like BCP in MSSQL server, sap SYBASE, copy from and Copy to in postgresql, sql-loader in oracle etc.6
CSV or Text file as databaseInterestingly, flat file database concept also floats in the market which consists of nothing but data. It is single file database with row, column combination exhibiting the data store, in which no links or relationship prevail and stored between records and fields and lot depends upon the table structure or design. In terms of describing database (flat file database), terms used can be different but on large the meaning remains the same just for example mysql database uses the concept of queries and Filemaker uses the terminology find for filter and fetch operation. So, on large the concept remains the same.
000000000000000000000
Plain text files usually contain one record per line or say per row. Using delimiters in csv or space delimited file incurs some overhead in terms of size and performance as locating each time this record takes some time compared to fixed size records.

Common and easy example of flat files are group and password file in linux like operating system; like /etc/passwd or /etc/group.1
Some of the practical text file based database implementation examples are; TextDB, a file-based database is designed to handle high level of loads, MySQL CSV, a storage engine for MySQL 5.x version, Berkeley DB, a robust flat file database for critical applications which supports ACID transactions, Borland Reflex, Mimesis, an FFDB written in PHP4 that uses multiple files and a heap method of storage, TheIntegrationEngineer How Delimited and Fixed Position or Fixed Width files compare and are used, Flat File Checker – Open source data validation application for flat files.2
Apart from the examples where flat text files are been used as databases, October is example for content management system featuring a flat file based database and Laravel as web framework.345
Programing languages support for ASCIIAny programing language which supports I/O and string manipulation or processing functionality is well capable of read and write CSV files. Some of the common examples of programming languages which support CSV are; BASIC, C++, C#, PASCAL, Delphi, Go, JAVA, MATLAB, LISP, Perl, PHP, Python, R, TCL, VB/Java – Script, VB, Powershell etc. There are and can be many utilities which are used for reformat CSV or compare CSV like csvprint, csvdiff utilities of Perl. Over the years many editors are developed to work effectively with CSV file format like CSVed, CSVJSON, CSV editor pro, ms-excel and many more.7
Tools for converting spatial / non-spatial to csvFor converting spatial and non-spatial data to csv one can refer to safe software FME workbench converter and for tutorial related to that. It makes complex data integration task looks simple.8 FME workbench desktop version and trial versions are also available for trial and hands-on. Another example is mailjet safesent email solution utility and app, which play around with CSV and data-set.9
Python library for csvPython is the most talked name in the recent times. Python professional community understand the importance of the CSV and its usage in data community, for the same reason there is dedicated csv library in python one can easily check docs.python.org/3/library/csv.html documentation for more clarity. CSV functions and classes of python like reader, writer, csv.DictReader, csv.DictWriter, writeRow, splitlines(), split(), etc and support for JSON and CSV conversion makes it favorable choice.10
Csvkit tool backed by GithubIn todays world when data science and data crunching topics are among the top talked ones, tools and suits backed by Github like csvkit can’t be ignored. Csvkit is set of commands purely command line tools part of csvkit suite for the purpose of converting to and working with CSV, the so called king of tabular file formats. Csvkit can be broadly divided into three segments; input – processing – output. CSVKIT is inspired by gdal , pdftk and the original csvcut tool by Aaron Bycoffe ; Joe Germuska. For input csvkit have commands like; in2csv, sql2csv, followed by processing commands like; csvclean, csvcut, csvgrep, csvjoin, csvsort, etc; and for output and analysis of data one can use csvkit commands like; csvformat, csvjson, csvlook,csvpy, csvsql, etc. Christopher Groskopf, Joe Germuska, Aaron Bycoffe, Travis Mehlinger, Alejandro Companioni, Benjamin Wilson, Bryan Silverthorn and many more are the authors who contributed a lot in csvkit code. One may refer to https://csvkit.readthedocs.org11
Text database – qA very strong alternative to csvsql is q. One can understand q – text as a database, q is a tool that handles all sorts of csv or flat files as databases allowing you to query from files that is text files which contain data as comma separated, space separated, or any other character delimited file. Now, querying for any sort of data output depending upon the requirement can be achieved using SQL language. SQL – structured query language is a declarative language for data. The bigger objective of this q tool is to act and provide bridge between flat files and SQL.15 This q tool or program is free software and covers under the terms of GNU licensing as published by free software foundation. As discussed the goal and purpose of q – tool is to perform SQL like statements on text or flat files which are comma or other character delimited text data. The purpose is clear that is to bring the SQL expressive power to command line and to provide text data to treat as database entities.AS q – tool act like sql query, any standard sql expression, conditions like where clause or group by , having , order by etc. are supported and allowed. Joins and sub-queries are allowed and supported in the where clause but not in from clause. SQL syntax reminds and resembles like sqlite’s syntax. For full online support and information one can refer to github.io/q usage page. It is worth to note here that python developed and api specifically for q so that Python modules can provide all q’s command line capabilities inclusive of capabilities like query execution and analysis. One can expose PyPI package of python and do the installation using pip install.121314
Linux column utility as csv viewerThe linux based CSV viewer command can be column –s, -t ; CSV.txt, it reads the comma separated file and displays in formatted fashion as compared to vi or cat command.18 otherwise as discussed one can install and use csvtool like on Ubuntu and can utilize this tool.

CSV in MYSQLIn MySQL database, which is having pluggable architecture and supports variety of engines, in one such engine and extension is support for csv engine. The csv storage engine stores the data in the text files having CSV format, CSV storage engine compiled into MySQL server, so in case one need to examine the source for the CSV engine then look out in the storage/csv data directory of the MySQL source distribution. In MySQL every table have .frm files so csv is no exception, but along-with it .csv is also created which actually contains the data in comma-separated manner, the meta information is although available in .csm file, big thing is that the .csv file actually contain readable textual data separated by comma values. In that sense, one can open this file in any of the csv editor including spreadsheets like excel (if not locked by any program). MySQL error-log can be maintained as csv file or say table and then queried using sql commands.16

Managing data using Database’sToday’s personal and professional life is dependent a lot on data. This can be in the form of personal contact’s, professional networking contact’s, transactions with banking institutions, email’s, social media, logs, or anything like this.

New era’s digital life style and dependency on digital computing with more popularity of digital social media is the new challenge for such huge produced data to be handled and re-used.

For this very purpose different types of database management systems are already introduced by the industry and no doubt they have resolved this problem to great extent. Everyday new features and new techniques are introduced to make the things possible in less cost and high performance and efficiency.

There are close to 350 databases exists in the world which comes as row oriented, column oriented and document oriented categories. Every database follows some common basic features and at the same time some specific features, architecture or functionality which makes it best fit for some or the other business needs.

Databases – Understanding the DB-WorldA databases are organized collection of data, stored and accessed using software. It is collection of data for future use stored in the form of some file which can be text or binary of some standard, so that makes it permanent persistent copy on disk or media. When a program or specific software access it it’s availability become fast and accurate (depends hugely on program and hardware combination along-with logic and algorithm used). The manner in which data is fundamentally structured in logical manner is data model or data modelling and amongst them most common and famous is relational model which uses table-based model, it was first described by EF Codd in 1969, where all data is represented in terms of tuples grouped into relations20. The purpose of the relational model is to provide a declarative method for specifying data and queries; where users directly state what information the database contains and what information they want or extract from it. Most relational database management systems use SQL-structured query language data definition and query technique language.
Alternate to relational model, are the hierarchical model and network model, the newer model is object oriented database model also exists.20

The fundamental assumption of the relational model is that all data is represented as mathematical n-ary relations, an n-ary relation being a subset of the Cartesian product of n domains. In the mathematical model, reasoning about such data is done in two-valued predicate logic, meaning there are two possible evaluations for each proposition: either true or false (and in particular no third value such as unknown, or not applicable, either of which are often associated with the concept of NULL). Data are operated upon by means of a relational calculus or relational algebra, these being equivalent in expressive power.

The relational model of data permits the database designer to create a consistent, logical representation of information. Consistency is achieved by including declared constraints in the database design. The theory includes a process of database normalization whereby a design with certain desirable properties can be selected from a set of logically equivalent alternatives. The access plans and other implementation and operation details are handled by the DBMS engine, and are not reflected in the logical model. This contrasts with common practice for SQL DBMSs in which performance tuning often requires changes to the logical model.

The basic relational building block is the domain or data type, usually abbreviated nowadays to type. A tuple is an ordered set of attribute values. An attribute is an ordered pair of attribute name and type name. An attribute value is a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type.

A relation consists of a heading and a body. A heading is a set of attributes. A body (of an n-ary relation) is a set of n-tuples. The heading of the relation is also the heading of each of its tuples.

A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of unique, non-duplicated items, although some DBMSs impose an order to their data. In mathematics, a tuple has an order, and allows for duplication. E. F. Codd originally defined tuples using this mathematical definition.20 Later, it was one of E. F. Codd’s great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations20. This insight is still being used today. Though the concept has changed, the name “tuple” has not. An immediate and important consequence of this distinguishing feature is that in the relational model the Cartesian product becomes commutative.

A table is an accepted visual representation of a relation; a tuple is similar to the concept of a row.

A relvar is a named variable of some specific relation type, to which at all times some relation of that type is assigned, though the relation may contain zero tuples.

The basic principle of the relational model is the Information Principle: all information is represented by data values in relations. In accordance with this Principle, a relational database is a set of relvars and the result of every query is presented as a relation.

The consistency of a relational database is enforced, not by rules built into the applications that use it, but rather by constraints, declared as part of the logical schema and enforced by the DBMS for all applications. In general, constraints are expressed using relational comparison operators, of which just one, “is subset of” (?), is theoretically sufficient20. In practice, several useful shorthands are expected to be available, of which the most important are candidate key (really, superkey) and foreign key constraints.

Relational operationsUsers (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many Web sites, (almost all), perform SQL queries when generating pages.

In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted.

Often, data from multiple tables are combined into one, by doing a join. Conceptually, this is done by taking all possible combinations of rows (the Cartesian product), and then filtering out everything except the answer. In practice, relational database management systems rewrite (“optimize”) queries to perform faster, using a variety of techniques.

There are a number of relational operations in addition to join. These include project (the process of eliminating some of the columns), restrict (the process of eliminating some of the rows), union (a way of combining two tables with similar structures), difference (that lists the rows in one table that are not found in the other), intersect (that lists the rows found in both tables), and product (mentioned above, which combines each row of one table with each row of the other). Depending on which other sources you consult, there are a number of other operators – many of which can be defined in terms of those listed above. These include semi-join, outer operators such as outer join and outer union, and various forms of division. Then there are operators to rename columns, and summarizing or aggregating operators, and if you permit relation values as attributes (RVA – relation-valued attribute), then operators such as group and ungroup. The SELECT statement in SQL serves to handle all of these except for the group and ungroup operators.

The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for a long time (perhaps several decades). This has made the idea and implementation of relational databases very popular with businesses.20
Key constraints and functional dependenciesOne of the simplest and most important types of relation constraints is the key constraint. It tells us that in every instance of a certain relational schema the tuples can be identified by their values for certain attributes.

SuperkeyA superkey is written as a finite set of attribute names.

A superkey {displaystyle K} K holds in a relation {displaystyle (H,B)} (H, B) if:
{displaystyle Ksubseteq H} K subseteq H and
there exist no two distinct tuples {displaystyle t_{1},t_{2}in B} t_1, t_2 in B such that {displaystyle t_{1}K=t_{2}K} t_1K = t_2K.

A superkey holds in a relation universe {displaystyle U} U if it holds in all relations in {displaystyle U} U.

Theorem: A superkey {displaystyle K} K holds in a relation universe {displaystyle U} U over {displaystyle H} H if and only if {displaystyle Ksubseteq H} K subseteq H and {displaystyle K
ightarrow H} K
ightarrow H holds in {displaystyle U} U.

Candidate keyA superkey {displaystyle K} K holds as a candidate key for a relation universe {displaystyle U} U if it holds as a superkey for {displaystyle U} U and there is no proper subset of {displaystyle K} K that also holds as a superkey for {displaystyle U} U.

Functional dependencyA functional dependency (FD for short) is written as {displaystyle X
ightarrow Y} X
ightarrow Y for {displaystyle X,Y} X,Y finite sets of attribute names.

A functional dependency {displaystyle X
ightarrow Y} X
ightarrow Y holds in a relation {displaystyle (H,B)} (H, B) if:
{displaystyle X,Ysubseteq H} X, Y subseteq H and
{displaystyle forall } forall tuples {displaystyle t_{1},t_{2}in B} t_1, t_2 in B, {displaystyle t_{1}X=t_{2}X~Rightarrow ~t_{1}Y=t_{2}Y} t_1X = t_2X~Rightarrow~t_1Y = t_2Y
A functional dependency {displaystyle X
ightarrow Y} X
ightarrow Y holds in a relation universe {displaystyle U} U if it holds in all relations in {displaystyle U} U.

Trivial functional dependencyA functional dependency is trivial under a header {displaystyle H} H if it holds in all relation universes over {displaystyle H} H.

Theorem: An FD {displaystyle X
ightarrow Y} X
ightarrow Y is trivial under a header {displaystyle H} H if and only if {displaystyle Ysubseteq Xsubseteq H} Y subseteq X subseteq H.

ClosureArmstrong’s axioms: The closure of a set of FDs {displaystyle S} S under a header {displaystyle H} H, written as {displaystyle S^{+}} S^+, is the smallest superset of {displaystyle S} S such that:
{displaystyle Ysubseteq Xsubseteq H~Rightarrow ~X
ightarrow Yin S^{+}} Y subseteq X subseteq H~Rightarrow~X
ightarrow Y in S^+ (reflexivity)
{displaystyle X
ightarrow Yin S^{+}land Y
ightarrow Zin S^{+}~Rightarrow ~X
ightarrow Zin S^{+}} X
ightarrow Y in S^+ land Y
ightarrow Z in S^+~Rightarrow~X
ightarrow Z in S^+ (transitivity) and
{displaystyle X
ightarrow Yin S^{+}land Zsubseteq H~Rightarrow ~(Xcup Z)
ightarrow (Ycup Z)in S^{+}} X
ightarrow Y in S^+ land Z subseteq H~Rightarrow~(X cup Z)
ightarrow (Y cup Z) in S^+ (augmentation)
Theorem: Armstrong’s axioms are sound and complete; given a header {displaystyle H} H and a set {displaystyle S} S of FDs that only contain subsets of {displaystyle H} H, {displaystyle X
ightarrow Yin S^{+}} X
ightarrow Y in S^+ if and only if {displaystyle X
ightarrow Y} X
ightarrow Y holds in all relation universes over {displaystyle H} H in which all FDs in {displaystyle S} S hold.

CompletionThe completion of a finite set of attributes {displaystyle X} X under a finite set of FDs {displaystyle S} S, written as {displaystyle X^{+}} X^+, is the smallest superset of {displaystyle X} X such that:
{displaystyle Y
ightarrow Zin Sland Ysubseteq X^{+}~Rightarrow ~Zsubseteq X^{+}} Y
ightarrow Z in S land Y subseteq X^+~Rightarrow~Z subseteq X^+
The completion of an attribute set can be used to compute if a certain dependency is in the closure of a set of FDs.

Theorem: Given a set {displaystyle S} S of FDs, {displaystyle X
ightarrow Yin S^{+}} X
ightarrow Y in S^+ if and only if {displaystyle Ysubseteq X^{+}} Y subseteq X^+.

Irreducible coverAn irreducible cover of a set {displaystyle S} S of FDs is a set {displaystyle T} T of FDs such that:
{displaystyle S^{+}=T^{+}} S^+ = T^+
there exists no {displaystyle Usubset T} U subset T such that {displaystyle S^{+}=U^{+}} S^+ = U^+
{displaystyle X
ightarrow Yin T~Rightarrow Y} X
ightarrow Y in T~Rightarrow Y is a singleton set and
{displaystyle X
ightarrow Yin Tland Zsubset X~Rightarrow ~Z
ightarrow Y
otin S^{+}} X
ightarrow Y in T land Z subset X~Rightarrow~Z
ightarrow Y
otin S^+.

Algorithm to derive candidate keys from functional dependencies INPUT: a set S of FDs that contain only subsets of a header H
OUTPUT: the set C of superkeys that hold as candidate keys in
all relation universes over H in which all FDs in S hold
begin
C := ?; // found candidate keys
Q := { H }; // superkeys that contain candidate keys
while Q ;; ? do
let K be some element from Q;
Q := Q – { K };
minimal := true;
for each X-;Y in S do
K’ := (K – Y) ? X; // derive new superkey
if K’ ? K then
minimal := false;
Q := Q ? { K’ };
end if
end for
if minimal and there is not a subset of K in C then
remove all supersets of K from C;
C := C ? { K };
end if
end while
end
Data Model – Overview
Hierarchical model
In a hierarchical model, data is organized into a tree-like structure, implying a single parent for each record. A sort field keeps sibling records in a particular order. Hierarchical structures were widely used in the early mainframe database management systems, such as the Information Management System (IMS) by IBM, and now describe the structure of XML documents. This structure allows one one-to-many relationship between two types of data. This structure is very efficient to describe many relationships in the real world; recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information.19
This hierarchy is used as the physical order of records in storage. Record access is done by navigating downward through the data structure using pointers combined with sequential accessing. Because of this, the hierarchical structure is inefficient for certain database operations when a full path (as opposed to upward link and sort field) is not also included for each record. Such limitations have been compensated for in later IMS versions by additional logical hierarchies imposed on the base physical hierarchy.

Network model
The network model expands upon the hierarchical structure, allowing many-to-many relationships in a tree-like structure that allows multiple parents. It was most popular before being replaced by the relational model, and is defined by the CODASYL specification.

The network model organizes data using two fundamental concepts, called records and sets. Records contain fields (which may be organized hierarchically, as in the programming language COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets.

A set consists of circular linked lists where one record type, the set owner or parent, appears once in each circle, and a second record type, the subordinate or child, may appear multiple times in each circle. In this way a hierarchy may be established between any two record types, e.g., type A is the owner of B. At the same time another set may be defined where B is the owner of A. Thus all the sets comprise a general directed graph (ownership defines a direction), or network construct. Access to records is either sequential (usually in each record type) or by navigation in the circular linked lists.

The network model is able to represent redundancy in data more efficiently than in the hierarchical model, and there can be more than one path from an ancestor node to a descendant. The operations of the network model are navigational in style: a program maintains a current position, and navigates from one record to another by following the relationships in which the record participates. Records can also be located by supplying key values.

Although it is not an essential feature of the model, network databases generally implement the set relationships by means of pointers that directly address the location of a record on disk. This gives excellent retrieval performance, at the expense of operations such as database loading and reorganization.

Popular DBMS products that utilized it were Cincom Systems’ Total and Cullinet’s IDMS. IDMS gained a considerable customer base; in the 1980s, it adopted the relational model and SQL in addition to its original tools and languages.

Most object databases (invented in the 1990s) use the navigational concept to provide fast navigation across networks of objects, generally using object identifiers as “smart” pointers to related objects. Objectivity/DB, for instance, implements named one-to-one, one-to-many, many-to-one, and many-to-many named relationships that can cross databases. Many object databases also support SQL, combining the strengths of both models.19
Relational model
The relational model was introduced by E.F. Codd in 197020 as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory, and its implementing it have been used by mainframe, midrange and microcomputer systems.

The products that are generally referred to as relational databases in fact implement a model that is only an approximation to the mathematical model defined by Codd. Three key terms are used extensively in relational database models: relations, attributes, and domains. A relation is a table with columns and rows. The named columns of the relation are called attributes, and the domain is the set of values the attributes are allowed to take.

The basic data structure of the relational model is the table, where information about a particular entity (say, an employee) is represented in rows (also called tuples) and columns. Thus, the “relation” in “relational database” refers to the various tables in the database; a relation is a set of tuples. The columns enumerate the various attributes of the entity (the employee’s name, address or phone number, for example), and a row is an actual instance of the entity (a specific employee) that is represented by the relation. As a result, each tuple of the employee table represents various attributes of a single employee.

All relations (and, thus, tables) in a relational database have to adhere to some basic rules to qualify as relations. First, the ordering of columns is immaterial in a table. Second, there can’t be identical tuples or rows in a table. And third, each tuple will contain a single value for each of its attributes.

A relational database contains multiple tables, each similar to the one in the “flat” database model. One of the strengths of the relational model is that, in principle, any value occurring in two different records (belonging to the same table or to different tables), implies a relationship among those two records. Yet, in order to enforce explicit integrity constraints, relationships between records in tables can also be defined explicitly, by identifying or non-identifying parent-child relationships characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also have a designated single attribute or a set of attributes that can act as a “key”, which can be used to uniquely identify each tuple in the table.

A key that can be used to uniquely identify a row in a table is called a primary key. Keys are commonly used to join or combine data from two or more tables. For example, an Employee table may contain a column named Location which contains a value that matches the key of a Location table. Keys are also critical in the creation of indexes, which facilitate fast retrieval of data from large tables. Any column can be a key, or multiple columns can be grouped together into a compound key. It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one.20
A key that has an external, real-world meaning (such as a person’s name, a book’s ISBN, or a car’s serial number) is sometimes called a “natural” key. If no natural key is suitable (think of the many people named Brown), an arbitrary or surrogate key can be assigned (such as by giving employees ID numbers). In practice, most databases have both generated and natural keys, because generated keys can be used internally to create links between rows that cannot break, while natural keys can be used, less reliably, for searches and for integration with other databases. (For example, records in two independently developed databases could be matched up by social security number, except when the social security numbers are incorrect, missing, or have changed.)
The most common query language used with the relational model is the Structured Query Language (SQL).

Apart from these common data model’s there are other’s also – like
Graph Model
Multivalue model (like compressed form of xml)
Etc.

DBMS – as software applicationAs discussed database-management system (DBMS) is a computer-software application that interacts with end-users, other applications, and the database itself to capture and analyze data. (Sometimes a DBMS is loosely referred to as a “database”.) A general-purpose DBMS allows the definition, creation, querying, update, and administration of databases. A database is generally stored in a DBMS-specific format which is not portable, but different DBMSs can share data by using standards such as SQL and ODBC or JDBC.

Computer scientists may classify database-management systems according to the database models that they support. Relational databases became dominant in the 1980s. These model data as rows and columns in a series of tables, and the vast majority use SQL for writing and querying data. In the 2000s, non-relational databases became popular, referred to as NoSQL because they use different query languages.

Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. RAID is used for recovery of data if any of the disks fail. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions.

Since DBMSs comprise a significant market, computer and storage vendors often take into account DBMS requirements in their own development plans.20
Databases and DBMSs can be categorized according to the database model(s) that they support (such as relational or XML), the type(s) of computer they run on (from a server cluster to a mobile phone), the query language(s) used to access the database (such as SQL or XQuery), and their internal engineering, which affects performance, scalability, resilience, and security.

General-purpose and special-purpose DBMSsDBMS may become a complex software system and its development typically requires thousands of human years of development effort. Some general-purpose DBMSs such as Adabas, Oracle and DB2 have been upgraded since the 1970s. General-purpose DBMSs aim to meet the needs of as many applications as possible, which adds to the complexity. However, since their development cost can be spread over a large number of users, they are often the most cost-effective approach. On the other hand, a general-purpose DBMS may introduce unnecessary overhead. Therefore, many systems use a special-purpose DBMS. A common example is an email system that performs many of the functions of a general-purpose DBMS such as the insertion and deletion of messages composed of various items of data or associating messages with a particular email address; but these functions are limited to what is required to handle email and don’t provide the user with all of the functionality that would be available using a general-purpose DBMS.

Application software can often access a database on behalf of end-users, without exposing the DBMS interface directly. Application programmers may use a wire protocol directly, or more likely through an application programming interface. Database designers and database administrators interact with the DBMS through dedicated interfaces to build and maintain the applications’ databases, and thus need some more knowledge and understanding about how DBMSs operate and the DBMSs’ external interfaces and tuning parameters.

History of DBMSThe sizes, capabilities, and performance of databases and their respective DBMSs have grown in orders of magnitude. These performance increases were enabled by the technology progress in the areas of processors, computer memory, computer storage, and computer networks. The development of database technology can be divided into three eras based on data model or structure: navigational,20 SQL/relational, and post-relational.

The two main early navigational data models were the hierarchical model, epitomized by IBM’s IMS system, and the CODASYL model (network model), implemented in a number of products such as IDMS.

The relational modelFirst proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that applications should search for data by content, rather than by following links. The relational model employs sets of ledger-style tables, each used for a different type of entity. Only in the mid-1980s did computing hardware become powerful enough to allow the wide deployment of relational systems (DBMSs plus applications). By the early 1990s, however, relational systems dominated in all large-scale data processing applications, and as of 2018 they remain dominant: IBM DB2, Oracle, MySQL, and Microsoft SQL Server are the top DBMS.20 The dominant database language, standardised SQL for the relational model, has influenced database languages for other data models.

Object databases
OOD were developed in the 1980s to overcome the inconvenience of object-relational impedance mismatch, which led to the coining of the term “post-relational” and also the development of hybrid object-relational databases.

The next generation of post-relational databases in the late 2000s became known as NoSQL databases, introducing fast key-value stores and document-oriented databases. A competing “next generation” known as NewSQL databases attempted new implementations that retained the relational/SQL model while aiming to match the high performance of NoSQL compared to commercially available relational DBMSs.

2000s, NoSQL and NewSQLXML databases are a type of structured document-oriented database that allows querying based on XML document attributes. XML databases are mostly used in enterprise database management, where XML is being used as the machine-to-machine data interoperability standard. XML database management systems include commercial software MarkLogic and Oracle Berkeley DB XML, and a free use software Clusterpoint Distributed XML/JSON Database. All are enterprise software database platforms and support industry standard ACID-compliant transaction processing with strong database consistency characteristics and high level of database security.20
NoSQL databases are often very fast, do not require fixed table schemas, avoid join operations by storing denormalized data, and are designed to scale horizontally. The most popular NoSQL systems include MongoDB, Couchbase, Riak, Memcached, Redis, CouchDB, Hazelcast, Apache Cassandra, and HBase,20 which are all open-source software products.

In recent years, there has been a strong demand for massively distributed databases with high partition tolerance, but according to the CAP theorem it is impossible for a distributed system to simultaneously provide consistency, availability, and partition tolerance guarantees. A distributed system can satisfy any two of these guarantees at the same time, but not all three. For that reason, many NoSQL databases are using what is called eventual consistency to provide both availability and partition tolerance guarantees with a reduced level of data consistency.

NewSQL is a class of modern relational databases that aims to provide the same scalable performance of NoSQL systems for online transaction processing (read-write) workloads while still using SQL and maintaining the ACID guarantees of a traditional database system. Such databases include Google F1/Spanner, Citus, CockroachDB, TiDB, ScaleBase, MemSQL, NuoDB,20 and VoltDB.

Examples
One way to classify databases involves the type of their contents, for example: bibliographic, document-text, statistical, or multimedia objects. Another way is by their application area, for example: accounting, music compositions, movies, banking, manufacturing, or insurance. A third way is by some technical aspect, such as the database structure or interface type. This section lists a few of the adjectives used to characterize different kinds of databases.

An in-memory database is a database that primarily resides in main memory, but is typically backed-up by non-volatile computer data storage. Main memory databases are faster than disk databases, and so are often used where response time is critical, such as in telecommunications network equipment.

An active database includes an event-driven architecture which can respond to conditions both inside and outside the database. Possible uses include security monitoring, alerting, statistics gathering and authorization. Many databases provide active database features in the form of database triggers.

A cloud database relies on cloud technology. Both the database and most of its DBMS reside remotely, “in the cloud”, while its applications are both developed by programmers and later maintained and utilized by end-users through a web browser and Open APIs.

Data warehouses archive data from operational databases and often from external sources such as market research firms. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data. For example, sales data might be aggregated to weekly totals and converted from internal product codes to use UPCs so that they can be compared with ACNielsen data. Some basic and essential components of data warehousing include extracting, analyzing, and mining data, transforming, loading, and managing data so as to make them available for further use.

A deductive database combines logic programming with a relational database, for example by using the Datalog language.

A distributed database is one in which both the data and the DBMS span multiple computers.

A document-oriented database is designed for storing, retrieving, and managing document-oriented, or semi structured, information. Document-oriented databases are one of the main categories of NoSQL databases.

An embedded database system is a DBMS which is tightly integrated with an application software that requires access to stored data in such a way that the DBMS is hidden from the application’s end-users and requires little or no ongoing maintenance.20
End-user databases consist of data developed by individual end-users. Examples of these are collections of documents, spreadsheets, presentations, multimedia, and other files. Several products exist to support such databases. Some of them are much simpler than full-fledged DBMSs, with more elementary DBMS functionality.

A federated database system comprises several distinct databases, each with its own DBMS. It is handled as a single database by a federated database management system (FDBMS), which transparently integrates multiple autonomous DBMSs, possibly of different types (in which case it would also be a heterogeneous database system), and provides them with an integrated conceptual view.

Sometimes the term multi-database is used as a synonym to federated database, though it may refer to a less integrated (e.g., without an FDBMS and a managed integrated schema) group of databases that cooperate in a single application. In this case, typically middleware is used for distribution, which typically includes an atomic commit protocol (ACP), e.g., the two-phase commit protocol, to allow distributed (global) transactions across the participating databases.

A graph database is a kind of NoSQL database that uses graph structures with nodes, edges, and properties to represent and store information. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases.

An array DBMS is a kind of NoSQL DBMS that allows modeling, storage, and retrieval of (usually large) multi-dimensional arrays such as satellite images and climate simulation output.

In a hypertext or hypermedia database, any word or a piece of text representing an object, e.g., another piece of text, an article, a picture, or a film, can be hyperlinked to that object. Hypertext databases are particularly useful for organizing large amounts of disparate information. For example, they are useful for organizing online encyclopedias, where users can conveniently jump around the text. The World Wide Web is thus a large distributed hypertext database.

A knowledge base (abbreviated KB, kb or ?) is a special kind of database for knowledge management, providing the means for the computerized collection, organization, and retrieval of knowledge. Also a collection of data representing problems with their solutions and related experiences.

A mobile database can be carried on or synchronized from a mobile computing device.

Operational databases store detailed data about the operations of an organization. They typically process relatively high volumes of updates using transactions. Examples include customer databases that record contact, credit, and demographic information about a business’s customers, personnel databases that hold information such as salary, benefits, skills data about employees, enterprise resource planning systems that record details about product components, parts inventory, and financial databases that keep track of the organization’s money, accounting and financial dealings.

A parallel database seeks to improve performance through parallelization for tasks such as loading data, building indexes and evaluating queries.

The major parallel DBMS architectures which are induced by the underlying hardware architecture are:
Shared memory architecture, where multiple processors share the main memory space, as well as other data storage.

Shared disk architecture, where each processing unit (typically consisting of multiple processors) has its own main memory, but all units share the other storage.

Shared nothing architecture, where each processing unit has its own main memory and other storage.

Probabilistic databases employ fuzzy logic to draw inferences from imprecise data.

Real-time databases process transactions fast enough for the result to come back and be acted on right away.

A spatial database can store the data with multidimensional features. The queries on such data include location-based queries, like “Where is the closest hotel in my area?”.

A temporal database has built-in time aspects, for example a temporal data model and a temporal version of SQL. More specifically the temporal aspects usually include valid-time and transaction-time.

A terminology-oriented database builds upon an object-oriented database, often customized for a specific field.

An unstructured data database is intended to store in a manageable and protected way diverse objects that do not fit naturally and conveniently in common databases. It may include email messages, documents, journals, multimedia objects, etc. The name may be misleading since some objects can be highly structured. However, the entire possible object collection does not fit into a predefined structured framework. Most established DBMSs now support unstructured data in various ways, and new dedicated DBMSs are emerging.

Design and modeling
The first task of a database designer is to produce a conceptual data model that reflects the structure of the information to be held in the database. A common approach to this is to develop an entity-relationship model, often with the aid of drawing tools. Another popular approach is the Unified Modeling Language. A successful data model will accurately reflect the possible state of the external world being modeled: for example, if people can have more than one phone number, it will allow this information to be captured. Designing a good conceptual data model requires a good understanding of the application domain; it typically involves asking deep questions about the things of interest to an organization, like “can a customer also be a supplier?”, or “if a product is sold with two different forms of packaging, are those the same product or different products?”, or “if a plane flies from New York to Dubai via Frankfurt, is that one flight or two (or maybe even three)?”. The answers to these questions establish definitions of the terminology used for entities (customers, products, flights, flight segments) and their relationships and attributes.

Producing the conceptual data model sometimes involves input from business processes, or the analysis of workflow in the organization. This can help to establish what information is needed in the database, and what can be left out. For example, it can help when deciding whether the database needs to hold historic data as well as current data.

Having produced a conceptual data model that users are happy with, the next stage is to translate this into a schema that implements the relevant data structures within the database. This process is often called logical database design, and the output is a logical data model expressed in the form of a schema. Whereas the conceptual data model is (in theory at least) independent of the choice of database technology, the logical data model will be expressed in terms of a particular database model supported by the chosen DBMS. (The terms data model and database model are often used interchangeably, but in this article we use data model for the design of a specific database, and database model for the modelling notation used to express that design.)
The most popular database model for general-purpose databases is the relational model, or more precisely, the relational model as represented by the SQL language. The process of creating a logical database design using this model uses a methodical approach known as normalization. The goal of normalization is to ensure that each elementary “fact” is only recorded in one place, so that insertions, updates, and deletions automatically maintain consistency.

The final stage of database design is to make the decisions that affect performance, scalability, recovery, security, and the like, which depend on the particular DBMS. This is often called physical database design, and the output is the physical data model. A key goal during this stage is data independence, meaning that the decisions made for performance optimization purposes should be invisible to end-users and applications. There are two types of data independence: Physical data independence and logical data independence. Physical design is driven mainly by performance requirements, and requires a good knowledge of the expected workload and access patterns, and a deep understanding of the features offered by the chosen DBMS.

Another aspect of physical database design is security. It involves both defining access control to database objects as well as defining security levels and methods for the data itself.

External, conceptual, and internal views
A database management system provides three views of the database data:
The external level defines how each group of end-users sees the organization of data in the database. A single database can have any number of views at the external level.

The conceptual level unifies the various external views into a compatible global view. It provides the synthesis of all the external views. It is out of the scope of the various database end-users, and is rather of interest to database application developers and database administrators.

The internal level (or physical level) is the internal organization of data inside a DBMS. It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the data, using storage structures such as indexes to enhance performance. Occasionally it stores data of individual views (materialized views), computed from generic data, if performance justification exists for such redundancy. It balances all the external views’ performance requirements, possibly conflicting, in an attempt to optimize overall performance across all activities.

Database Languages classificationDatabase languages are special-purpose languages, which allow one or more of the following tasks, sometimes distinguished as sublanguages:
Data control language (DCL) – controls access to data;
Data definition language (DDL) – defines data types such as creating, altering, or dropping and the relationships among them;
Data manipulation language (DML) – performs tasks such as inserting, updating, or deleting data occurrences;
Data query language (DQL) – allows searching for information and computing derived information.

Database languages are specific to a particular data model. Notable examples include:
SQL combines the roles of data definition, data manipulation, and query in a single language. It was one of the first commercial languages for the relational model, although it departs in some respects from the relational model as described by Codd (for example, the rows and columns of a table can be ordered). SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987. The standards have been regularly enhanced since and is supported (with varying degrees of conformance) by all mainstream commercial relational DBMSs.

OQL is an object model language standard (from the Object Data Management Group). It has influenced the design of some of the newer query languages like JDOQL and EJB QL.

XQuery is a standard XML query language implemented by XML database systems such as MarkLogic and eXist, by relational databases with XML capability such as Oracle and DB2, and also by in-memory XML processors such as Saxon.

SQL/XML combines XQuery with SQL.

A database language may also incorporate features like:
DBMS-specific Configuration and storage engine management
Computations to modify query results, like counting, summing, averaging, sorting, grouping, and cross-referencing
Constraint enforcement (e.g. in an automotive database, only allowing one engine type per car)
Application programming interface version of the query language, for programmer convenience
Performance, security, and availabilityBecause of the critical importance of database technology to the smooth running of an enterprise, database systems include complex mechanisms to deliver the required performance, security, and availability, and allow database administrators to control the use of these features.

StorageDatabase storage is the container of the physical materialization of a database. It comprises the internal (physical) level in the database architecture. It also contains all the information needed (e.g., metadata, “data about the data”, and internal data structures) to reconstruct the conceptual level and external level from the internal level when needed. Putting data into permanent storage is generally the responsibility of the database engine a.k.a. “storage engine”. Though typically accessed by a DBMS through the underlying operating system (and often utilizing the operating systems’ file systems as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., memory and external storage). The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels’ reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).

Some DBMSs support specifying which character encoding was used to store data, so multiple encodings can be used in the same database.

Various low-level database storage structures are used by the storage engine to serialize the data model so it can be written to the medium of choice. Techniques such as indexing may be used to improve performance. Conventional storage is row-oriented, but there are also column-oriented and correlation databases.

Materialized viewsOften storage redundancy is employed to increase performance. A common example is storing materialized views, which consist of frequently needed external views or query results. Storing such views saves the expensive computing of them each time they are needed. The downsides of materialized views are the overhead incurred when updating them to keep them synchronized with their original updated database data, and the cost of storage redundancy.

ReplicationOccasionally a database employs storage redundancy by database objects replication (with one or more copies) to increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a replicated object need to be synchronized across the object copies. In many cases, the entire database is replicated.

SecurityDatabase security deals with all various aspects of protecting the database content, its owners, and its users. It ranges from protection from intentional unauthorized database uses to unintentional database accesses by unauthorized entities (e.g., a person or a computer program).

Database access control deals with controlling who (a person or a certain computer program) is allowed to access what information in the database. The information may comprise specific database objects (e.g., record types, specific records, data structures), certain computations over certain objects (e.g., query types, or specific queries), or utilizing specific access paths to the former (e.g., using specific indexes or other data structures to access information). Database access controls are set by special authorized (by the database owner) personnel that uses dedicated protected security DBMS interfaces.

This may be managed directly on an individual basis, or by the assignment of individuals and privileges to groups, or (in the most elaborate models) through the assignment of individuals and groups to roles which are then granted entitlements. Data security prevents unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or subsets of it called “subschemas”. For example, an employee database can contain all the data about an individual employee, but one group of users may be authorized to view only payroll data, while others are allowed access to only work history and medical data. If the DBMS provides a way to interactively enter and update the database, as well as interrogate it, this capability allows for managing personal databases.

Data security in general deals with protecting specific chunks of data, both physically (i.e., from corruption, or destruction, or removal; e.g., see physical security), or the interpretation of them, or parts of them to meaningful information (e.g., by looking at the strings of bits that they comprise, concluding specific valid credit-card numbers; e.g., see data encryption).

Change and access logging records who accessed which attributes, what was changed, and when it was changed. Logging services allow for a forensic database audit later by keeping a record of access occurrences and changes. Sometimes application-level code is used to record changes rather than leaving this to the database. Monitoring can be set up to attempt to detect security breaches.

Transactions and concurrencyDatabase transactions can be used to introduce some level of fault tolerance and data integrity after recovery from a crash. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are included in that transaction (determined by the transaction’s programmer via special transaction commands).

The acronym ACID describes some ideal properties of a database transaction: atomicity, consistency, isolation, and durability.

MigrationA database built with one DBMS is not portable to another DBMS (i.e., the other DBMS cannot run it). However, in some situations, it is desirable to move, migrate a database from one DBMS to another. The reasons are primarily economical (different DBMSs may have different total costs of ownership or TCOs), functional, and operational (different DBMSs may have different capabilities). The migration involves the database’s transformation from one DBMS type to another. The transformation should maintain (if possible) the database related application (i.e., all related application programs) intact. Thus, the database’s conceptual and external architectural levels should be maintained in the transformation. It may be desired that also some aspects of the architecture internal level are maintained. A complex or large database migration may be a complicated and costly (one-time) project by itself, which should be factored into the decision to migrate. This in spite of the fact that tools may exist to help migration between specific DBMSs. Typically, a DBMS vendor provides tools to help importing databases from other popular DBMSs.

Building, maintaining, and tuningAfter designing a database for an application, the next stage is building the database. Typically, an appropriate general-purpose DBMS can be selected to be utilized for this purpose. A DBMS provides the needed user interfaces to be utilized by database administrators to define the needed application’s data structures within the DBMS’s respective data model. Other user interfaces are used to select needed DBMS parameters (like security related, storage allocation parameters, etc.).

When the database is ready (all its data structures and other needed components are defined), it is typically populated with initial application’s data (database initialization, which is typically a distinct project; in many cases using specialized DBMS interfaces that support bulk insertion) before making it operational. In some cases, the database becomes operational while empty of application data, and data are accumulated during its operation.

After the database is created, initialized and populated it needs to be maintained. Various database parameters may need changing and the database may need to be tuned (tuning) for better performance; application’s data structures may be changed or added, new related application programs may be written to add to the application’s functionality, etc.

Backup and restoreSometimes it is desired to bring a database back to a previous state (for many reasons, e.g., cases when the database is found corrupted due to a software error, or if it has been updated with erroneous data). To achieve this, a backup operation is done occasionally or continuously, where each desired database state (i.e., the values of its data and their embedding in database’s data structures) is kept within dedicated backup files (many techniques exist to do this effectively). When this state is needed, i.e., when it is decided by a database administrator to bring the database back to this state (e.g., by specifying this state by a desired point in time when the database was in this state), these files are utilized to restore that state.

Other DBMS features
Other DBMS features might include:
Database logs – This helps in keeping a history of the executed functions.

Graphics component for producing graphs and charts, especially in a data warehouse system
Query optimizer – Performs query optimization on every query to choose an efficient query plan (a partial order (tree) of operations) to be executed to compute the query result. May be specific to a particular storage engine.

Tools or hooks for database design, application programming, application program maintenance, database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a DBMS and related database may span computers, networks, and storage units) and related database mapping (especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc.

Increasingly, there are calls for a single system that incorporates all of these core functionalities into the same build, test, and deployment framework for database management and source control. Borrowing from other developments in the software industry, some market such offerings as “DevOps for database”.20
List of some of the database products with license and stable version information:DB Product First public release date Latest stable version License
4D (4th Dimension) 1984 v16.0 Proprietary
ADABAS 1970 8.1 Proprietary
Adaptive Server Enterprise 1987 16 Proprietary
Advantage Database Server (ADS) 1992 12 Proprietary
Altibase 2000 7.1.0.1.2 GNU-AGPLv3; GNU-LGPLv3(for client-libraries)
Apache Derby 2004 10.14.2.02 Apache License
ClustrixDB 2010 v7.0 Proprietary
CUBRID 2008-11 10.1.0 GPL v2 or later, BSD license for APIs and GUI tools
Datacom Early 70s5 146 Proprietary
DB2 1983 11.1 Proprietary
Empress Embedded Database 1979 10.2 Proprietary
EXASolution 2004 6.0.0 Proprietary
FileMaker 1985-04 16 Proprietary
Firebird 7/25/2000 3.0.3 IPL8 and IDPL9
GPUdb 2014 3.2.5 Proprietary
H2 2005 1.4.197 EPL and modified MPL
HP NonStop SQL 1987 SQL/MX 3.4 Proprietary
HSQLDB 2001 2.4.1 BSD
Informix Dynamic Server 1980 12.10.xC10 Proprietary
Ingres 1974 10.2 GPL and Proprietary
InterBase 1984 XE7 v12.0.4.357 Proprietary
Linter SQL RDBMS 1990 6.0.18, 6.1.15 Proprietary
LucidDB 2007-01 0.9.4 GPL v2
MariaDB 2/1/2010 10.3.8 GPL v2, LGPL (for client-libraries)14
MaxDB 2003-05 7.9.0.8 Proprietary
MemSQL 2012-06 5.8.6 (2017) Proprietary
Microsoft Access (JET) 1992 16 (2016) Proprietary
Microsoft SQL Server 1989 2017 (14) Proprietary
Microsoft SQL Server Compact (Embedded Database) 2000 2011 (v4.0) Proprietary
Microsoft Visual Foxpro 1984 9 (2005) Proprietary
Mimer SQL 1978 v10.1.4 Proprietary
MonetDB 2004 Mar2018-SP1 Mozilla Public License, version 2.017
mSQL 1994 4.119 Proprietary
MySQL 1995-11 8.0.1120 GPL v2 or Proprietary
NexusDB 2003 4.00.14 Proprietary
NuoDB 2013 3.0.1 Proprietary
Omnis Studio 1982-07 6.1.3 Release 1no Proprietary
OpenBase SQL 1991 11.0.0 Proprietary
OpenEdge 1984 11.6.3 Proprietary
OpenLink Virtuoso 1998 7.2.5 GPL v2 or Proprietary
Oracle DB 1979-11 12.2.0.1 Proprietary
Oracle Rdb 1984 7.3.1.2 Proprietary
Paradox 1985 11 Proprietary
Percona Server for MySQL 2006 5.7.21-2125 GPL v2
Pervasive PSQL 1982 v12 Proprietary
Polyhedra DBMS 1993 9 Proprietary, with Polyhedra Lite available as Freeware26
PostgreSQL 1989-06 10.5 PostgreSQL Licence (a liberal Open Source license)28
R:Base 1982 10 Proprietary
Raima Database Manager 1984 14.1 Proprietary
RDM Server 1993 8.4 Proprietary
SAP HANA 2010 2.0 SPS02 Proprietary
solidDB 1992 7.0.0.10 Proprietary
SQL Anywhere 1992 17 Proprietary
SQLBase 1982 11.5 Proprietary
SQLite 9/12/2000 3.24.0 Public domain
SQream DB 2014 2.132 Proprietary
Superbase 1984 Scientific (2004) Proprietary
Teradata 1984 15 Proprietary
Tibero 1992 6 Proprietary
UniData 1988 8.2.1 Proprietary
Database products support on Windows / Linux / Unix Operating system
DB Product Name Windows Linux UNIX
4th Dimension Yes No No
ADABAS Yes Yes Yes
Adaptive Server Enterprise Yes Yes Yes
Advantage Database Server Yes Yes No
Altibase Yes Yes Yes
Apache Derby Yes Yes Yes
ClustrixDB No Yes Yes
CUBRID Yes Yes No
DB2 Yes Yes Yes
Empress Embedded Database Yes Yes Yes
EXASolution No Yes No
FileMaker Yes Yes No
Firebird Yes Yes Yes
HSQLDB Yes Yes Yes
H2 Yes Yes Yes
Informix Dynamic Server Yes Yes Yes
Ingres Yes Yes Yes
InterBase Yes Yes Yes (Solaris)
Linter SQL RDBMS Yes Yes Yes
LucidDB Yes Yes No
MariaDB Yes Yes Yes
MaxDB Yes Yes Yes
Microsoft Access (JET) Yes No No
Microsoft Visual Foxpro Yes No No
Microsoft SQL Server Yes Yes No
Microsoft SQL Server Compact (Embedded Database) Yes No No
Mimer SQL Yes Yes Yes
MonetDB Yes Yes Yes
MySQL Yes Yes Yes
Omnis Studio Yes Yes No
OpenBase SQL Yes Yes Yes
OpenEdge Yes Yes Yes
OpenLink Virtuoso Yes Yes Yes
Oracle Yes Yes Yes
Oracle Rdb No No No
Pervasive PSQL Yes Yes No
Polyhedra Yes Yes Yes
PostgreSQL Yes Yes Yes
R:Base Yes No No
Raima Database Manager Yes Yes Yes
RDM Server Yes Yes Yes
SAP HANA Yes Yes No
solidDB Yes Yes Yes
SQL Anywhere Yes Yes Yes
SQLBase Yes Yes No
SQLite Yes Yes Yes
SQream DB No Yes No
Superbase Yes No No
Teradata Yes Yes Yes
Tibero Yes Yes Yes
UniData Yes Yes Yes
UniVerse Yes Yes Yes
Database Product with feature support like ACID, Referential Integrity & Transaction Management:
Database Name ACID Referential integrity Transactions
4th Dimension Yes Yes Yes
ADABAS Yes No Yes
Adaptive Server Enterprise Yes Yes Yes
Advantage Database Server Yes Yes Yes
Altibase Yes Yes Yes
Apache Derby Yes Yes Yes
ClustrixDB Yes Yes Yes
CUBRID Yes Yes Yes
DB2 Yes Yes Yes
Empress Embedded Database Yes Yes Yes
EXASolution Yes Yes Yes
Firebird Yes Yes Yes
HSQLDB Yes Yes Yes
H2 Yes Yes Yes
Informix Dynamic Server Yes Yes Yes
Ingres Yes Yes Yes
InterBase Yes Yes Yes
Linter SQL RDBMS Yes Yes Yes (Except for DDL)
LucidDB Yes No No
MariaDB Yes2 Yes6 Yes2 except for DDL
MaxDB Yes Yes Yes
Microsoft Access (JET) Yes Yes Yes
Microsoft Visual FoxPro No Yes Yes
Microsoft SQL Server Yes Yes Yes
Microsoft SQL Server Compact (Embedded Database) Yes Yes Yes
Mimer SQL Yes Yes Yes
MonetDB Yes Yes Yes
MySQL Yes2 Yes3 Yes except for DDL
OpenBase SQL Yes Yes Yes
OpenEdge Yes Yes7 Yes
OpenLink Virtuoso Yes Yes Yes
Oracle Yes Yes Yes except for DDL
Oracle Rdb Yes Yes Yes
Pervasive PSQL Yes Yes Yes
Polyhedra DBMS Yes Yes Yes
PostgreSQL Yes Yes Yes
Raima Database Manager Yes Yes Yes
RDM Server Yes Yes Yes
SAP HANA Yes Yes Yes
solidDB Yes Yes Yes
SQL Anywhere Yes Yes Yes
SQLBase Yes Yes Yes
SQLite Yes Yes Yes
Teradata Yes Yes Yes
Tibero Yes Yes Yes
UniData Yes No Yes
UniVerse Yes No Yes
Database vendors – Market shareAccording to DB-Engines, in June 2018, the most widely used systems were Oracle, MySQL (Free software), Microsoft SQL Server, PostgreSQL (Free software), IBM DB2, Microsoft Access, and SQLite (Free software).22
According to research company Gartner, in 2011, the five leading Proprietary software relational database vendors by revenue were Oracle (48.8%), IBM (20.2%), Microsoft (17.0%), SAP including Sybase (4.6%), and Teradata (3.7%).

Advantages, features of databases:There are so many databases already available in the market by various vendors like Oracle, Microsoft, SAP, IBM etc. Few of them are free and some of them are having proprietary license. Some work on some particular Operating system and few on majorly all. Some have support for particular features and few lack that. Few come as community edition and other as cluster and so on. Every database product or version come with its good and bad and makes it perfect or not so good choice depending on needs and feature combination.

As SQL – structured query language (TSQL, PLSQL, ANSI-SQL) is used to write queries, some of the major jobs needed and performed are as follows:
Fetching of data (using select)
Insertion of data (using manual insert, bulk insert, or utility to transfer data)
Removing unwanted data (delete data rows)
Fetching of data as mention above but based on some condition / criteria (using where clause).

Fetching data with or without condition with full projection or partial projection of columns.

Fetching data with or without condition with full projection or partial projection of columns from multiple sources based on some relation.

Reporting data using aggregation of data.

Sorting data on some or few columns to arrange in ascending or descending order.

And many more.

One of the thing is that DBMS software needs to be installed and configured and one needs to plan data design, hardware / software cost and special cost and time cost in terms of learning that particular flavor and version, which may change Total cost of ownership for that product and may adversely affect the project budgeting.
Also, a lot many time because of many readers block writers – performance hampers and DBA’s (database administrators) need to go with replication solution to apply load balancing along-with HA (high availability), FO (fail-over).

Operating System – IntroductionFrom the invention of computer system in 1940 -1950’s era moving from serial processing to simple batch processing, multi-programmed batch processing to time sharing systems, operating systems make way to provide particular set of services as program and make the life of its users easy.

In the certain list of services provided by operating system one can sense program execution, I/O operations, File management, IPC- Inter process communication, possible error detection, memory management, hardware – peripheral management, user management, scheduling are few of them, that one cannot miss. Security and user interface can be other few aspects which can be discussed in length after that.

Among the few famous operating system DOS, windows 3.x, windows 95/98/SE/ME/XP/Vista/NT/7/8/10 etc. are names to mention few, apart from family of MULTICS – like unix SCO/BSD, FreeBSD, openBSD, Xenix, Linux etc
As we have many operating systems today depending upon our gadget, need and suitability, the main reason one assumes is that it is the layer between the man and machine. Operating system performs some common tasks like; provides user interface (CUI/GUI), Peripheral management, File Management, User Management, Process Management, Storage Management, Security management, etc.
We focus on Linux as operating system in our discussion, so Linux kernel was project of Linus Torvald and originated in the year 1991. It is open license open source model variant of Unix. Some of the famous distributions like Suse, Fedora, CentOS, Google Android, Ubuntu, Chrome OS, Red Hat etc. use Linux kernel. It is more famous on servers compared to desktops. In Linux apart from kernel there are many utilities or system programs which comes from GNU which is free of cost.

Most of end-users depending upon their preference either wants performance, ease and depending upon that use CLI or GUI mode. Non-technical or for ease use a large segment of people prefer GUI mode of operating systems like Windows or X Windows system / X / X11 in linux/ unix like systems.

The minimal need today to work with any computer based system is to have appropriate operating system followed by required set of application programs, as per the needs. This can range from editors to players and some specific application software’s like browsers, database – program software’s for managing data.

Every additional application program means at least one process for scheduling with some memory along-with additional cost of basic software residue. The more the processes the more the load on system but that is how the multi-user, multiple application based systems work.

552453301365HARDWARE
Operating System
Application Software’s
Users
Figure 1System Layers
00HARDWARE
Operating System
Application Software’s
Users
Figure 1System Layers
Apart from good hardware the operating system features also play vital role in performance and speed. So, a lot has been done in the recent years for this to keep an check on how and what features an operating system has and what can be utilized to get the best from the combination of hardware and software.

Linux system is comprised of mainly kernel, System libraries and System utilities.

Kernel abstracts all operating system responsibilities like process management and virtual memory etc. System libraries are set of functions through which applications interact and kernel one of such system library is libc. System utilities are programs which performs specialized tasks.

Functions performed by an Operating SystemThere are Many Functions those are Performed by the Operating System But the Main Goal of Operating System is to Provide – the Interface between the user and the hardware Means Provides the Interface for Working on the System by the user. The various Functions those are Performed by the Operating System are as Explained below:-
Operating System as a Resource Manager
Operating System Also Known as the Resource Manager Means Operating System will Manages all the Resources those are Attached to the System means all the Resource like Memory and Processor and all the Input output Devices those are Attached to the System are Known as the Resources of the Computer System and the Operating system will Manage all the Resources of the System. The Operating System will identify at which Time the CPU will perform which Operation and in which Time the Memory is used by which Programs. And which Input Device will respond to which Request of the user means When the Input and Output Devices are used by the which Programs. So this will manage all the Resources those are attached to the Computer System.
Storage Management
Operating System also Controls the all the Storage Operations means how the data or files will be Stored into the computers and how the Files will be Accessed by the users etc. All the Operations those are Responsible for Storing and Accessing the Files is determined by the Operating System Operating System also Allows us Creation of Files, Creation of Directories and Reading and Writing the data of Files and Directories and also Copy the contents of the Files and the Directories from One Place to Another Place.
Process Management:
The Operating System also Treats the Process Management means all the Processes those are given by the user or the Process those are System ‘s own Process are Handled by the Operating System . The Operating System will Create the Priorities foe the user and also Start or Stops the Execution of the Process and Also Makes the Child Process after dividing the Large Processes into the Small Processes.
Memory Management:
Operating System also manages the memory of the computer system means Provide the Memory to the Process and also de-allocate the Memory from the Process. And also defines that if a Process gets completed then this will de-allocate the memory from the processes.
Extended Machine :
Operating System also behaves like an Extended Machine means Operating system also Provides us Sharing of Files between Multiple Users, also Provides Some Graphical Environments and also Provides Various Languages for Communications and also Provides Many Complex Operations like using Many Hardware’s and Software’s.
Other activities acting like Mastermind:
Operating System also performs Many Functions and for those Reasons we can say that Operating System is a Mastermind. It provides Booting without an Operating System and Provides Facility to increase the Logical Memory of the Computer System by using the Physical Memory of the Computer System and also provides various Types of Formats Like NTFS and FAT File Systems.

Operating System also controls the Errors those have been Occurred into the Program and Also Provides Recovery of the System when the System gets Damaged Means When due to Some Hardware Failure , if System Doesn’t Works properly then this Recover the System and also Correct the System and also Provides us the Backup Facility. And Operating System also breaks the large program into the Smaller Programs those are also called as the threads. And execute those threads one by one.49
50
A monolithic OS architecture