Cơ sở dữ liệu - Chapter 20: Database system architectures
Shared memory contains shared data
● Buffer pool
● Lock table
● Log buffer
● Cached query plans (reused if same query submitted again)
■ All database processes can access shared memory
■ To ensure that no two processes are accessing the same data structure
at the same time, databases systems implement mutual exclusion
using either
● Operating system semaphores
● Atomic instructions such as testandset
■ To avoid overhead of interprocess communication for lock
request/grant, each database process operates directly on the lock
table
● instead of sending requests to lock manager process
■ Lock manager process still used for deadlock detection
37 trang |
Chia sẻ: huyhoang44 | Lượt xem: 719 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Cơ sở dữ liệu - Chapter 20: Database system architectures, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Database System Concepts, 5th Ed.
©Silberschatz, Korth and Sudarshan
See www.dbbook.com for conditions on reuse
Chapter 20: Database System Architectures
Version: Oct 5, 2006
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Chapter 20: Database System Architectures
n Centralized and ClientServer Systems
n Server System Architectures
n Parallel Systems
n Distributed Systems
n Network Types
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Centralized Systems
n Run on a single computer system and do not interact with other
computer systems.
n Generalpurpose computer system: one to a few CPUs and a number
of device controllers that are connected through a common bus that
provides access to shared memory.
n Singleuser system (e.g., personal computer or workstation): desktop
unit, single user, usually has only one CPU and one or two hard
disks; the OS may support only one user.
n Multiuser system: more disks, more memory, multiple CPUs, and a
multiuser OS. Serve a large number of users who are connected to
the system vie terminals. Often called server systems.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
A Centralized Computer System
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
ClientServer Systems
n Server systems satisfy requests generated at m client systems, whose general
structure is shown below:
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
ClientServer Systems (Cont.)
n Database functionality can be divided into:
l Backend: manages access structures, query evaluation and
optimization, concurrency control and recovery.
l Frontend: consists of tools such as forms, reportwriters, and
graphical user interface facilities.
n The interface between the frontend and the backend is through SQL or
through an application program interface.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
ClientServer Systems (Cont.)
n Advantages of replacing mainframes with networks of workstations or
personal computers connected to backend server machines:
l better functionality for the cost
l flexibility in locating resources and expanding facilities
l better user interfaces
l easier maintenance
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Server System Architecture
n Server systems can be broadly categorized into two kinds:
l transaction servers which are widely used in relational database
systems, and
l data servers, used in objectoriented database systems
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Transaction Servers
n Also called query server systems or SQL server systems
l Clients send requests to the server
l Transactions are executed at the server
l Results are shipped back to the client.
n Requests are specified in SQL, and communicated to the server
through a remote procedure call (RPC) mechanism.
n Transactional RPC allows many RPC calls to form a transaction.
n Open Database Connectivity (ODBC) is a C language application
program interface standard from Microsoft for connecting to a server,
sending SQL requests, and receiving results.
n JDBC standard is similar to ODBC, for Java
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Transaction Server Process Structure
n A typical transaction server consists of multiple processes accessing
data in shared memory.
n Server processes
l These receive user queries (transactions), execute them and send
results back
l Processes may be multithreaded, allowing a single process to
execute several user queries concurrently
l Typically multiple multithreaded server processes
n Lock manager process
l More on this later
n Database writer process
l Output modified buffer blocks to disks continually
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Transaction Server Processes (Cont.)
n Log writer process
l Server processes simply add log records to log record buffer
l Log writer process outputs log records to stable storage.
n Checkpoint process
l Performs periodic checkpoints
n Process monitor process
l Monitors other processes, and takes recovery actions if any of the other
processes fail
E.g. aborting any transactions being executed by a server process
and restarting it
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Transaction System Processes (Cont.)
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Transaction System Processes (Cont.)
n Shared memory contains shared data
l Buffer pool
l Lock table
l Log buffer
l Cached query plans (reused if same query submitted again)
n All database processes can access shared memory
n To ensure that no two processes are accessing the same data structure
at the same time, databases systems implement mutual exclusion
using either
l Operating system semaphores
l Atomic instructions such as testandset
n To avoid overhead of interprocess communication for lock
request/grant, each database process operates directly on the lock
table
l instead of sending requests to lock manager process
n Lock manager process still used for deadlock detection
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Data Servers
n Used in highspeed LANs, in cases where
l The clients are comparable in processing power to the server
l The tasks to be executed are compute intensive.
n Data are shipped to clients where processing is performed, and then
shipped results back to the server.
n This architecture requires full backend functionality at the clients.
n Used in many objectoriented database systems
n Issues:
l PageShipping versus ItemShipping
l Locking
l Data Caching
l Lock Caching
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Data Servers (Cont.)
n Pageshipping versus itemshipping
l Smaller unit of shipping ⇒ more messages
l Worth prefetching related items along with requested item
l Page shipping can be thought of as a form of prefetching
n Locking
l Overhead of requesting and getting locks from server is high due
to message delays
l Can grant locks on requested and prefetched items; with page
shipping, transaction is granted lock on whole page.
l Locks on a prefetched item can be P{called back} by the server,
and returned by client transaction if the prefetched item has not
been used.
l Locks on the page can be deescalated to locks on items in the
page when there are lock conflicts. Locks on unused items can
then be returned to server.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Data Servers (Cont.)
n Data Caching
l Data can be cached at client even in between transactions
l But check that data is uptodate before it is used (cache coherency)
l Check can be done when requesting lock on data item
n Lock Caching
l Locks can be retained by client system even in between transactions
l Transactions can acquire cached locks locally, without contacting
server
l Server calls back locks from clients when it receives conflicting lock
request. Client returns lock once no local transaction is using it.
l Similar to deescalation, but across transactions.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Parallel Systems
n Parallel database systems consist of multiple processors and multiple
disks connected by a fast interconnection network.
n A coarsegrain parallel machine consists of a small number of
powerful processors
n A massively parallel or fine grain parallel machine utilizes
thousands of smaller processors.
n Two main performance measures:
l throughput the number of tasks that can be completed in a
given time interval
l response time the amount of time it takes to complete a single
task from the time it is submitted
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
SpeedUp and ScaleUp
n Speedup: a fixedsized problem executing on a small system is given
to a system which is Ntimes larger.
l Measured by:
speedup = small system elapsed time
large system elapsed time
l Speedup is linear if equation equals N.
n Scaleup: increase the size of both the problem and the system
l Ntimes larger system used to perform Ntimes larger job
l Measured by:
scaleup = small system small problem elapsed time
big system big problem elapsed time
l Scale up is linear if equation equals 1.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Speedup
Speedup
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Scaleup
Scaleup
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Batch and Transaction Scaleup
n Batch scaleup:
l A single large job; typical of most decision support queries and
scientific simulation.
l Use an Ntimes larger computer on Ntimes larger problem.
n Transaction scaleup:
l Numerous small queries submitted by independent users to a
shared database; typical transaction processing and timesharing
systems.
l Ntimes as many users submitting requests (hence, Ntimes as
many requests) to an Ntimes larger database, on an Ntimes
larger computer.
l Wellsuited to parallel execution.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Factors Limiting Speedup and Scaleup
Speedup and scaleup are often sublinear due to:
n Startup costs: Cost of starting up multiple processes may dominate
computation time, if the degree of parallelism is high.
n Interference: Processes accessing shared resources (e.g.,system
bus, disks, or locks) compete with each other, thus spending time
waiting on other processes, rather than performing useful work.
n Skew: Increasing the degree of parallelism increases the variance in
service times of parallely executing tasks. Overall execution time
determined by slowest of parallely executing tasks.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Interconnection Network Architectures
n Bus. System components send data on and receive data from a single
communication bus;
l Does not scale well with increasing parallelism.
n Mesh. Components are arranged as nodes in a grid, and each
component is connected to all adjacent components
l Communication links grow with growing number of components,
and so scales better.
l But may require 2√n hops to send message to a node (or √n with
wraparound connections at edge of grid).
n Hypercube. Components are numbered in binary; components are
connected to one another if their binary representations differ in
exactly one bit.
l n components are connected to log(n) other components and can
reach each other via at most log(n) links; reduces communication
delays.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Interconnection Architectures
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Parallel Database Architectures
n Shared memory processors share a common memory
n Shared disk processors share a common disk
n Shared nothing processors share neither a common memory nor
common disk
n Hierarchical hybrid of the above architectures
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Parallel Database Architectures
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Shared Memory
n Processors and disks have access to a common memory, typically via
a bus or through an interconnection network.
n Extremely efficient communication between processors — data in
shared memory can be accessed by any processor without having to
move it using software.
n Downside – architecture is not scalable beyond 32 or 64 processors
since the bus or the interconnection network becomes a bottleneck
n Widely used for lower degrees of parallelism (4 to 8).
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Shared Disk
n All processors can directly access all disks via an interconnection
network, but the processors have private memories.
l The memory bus is not a bottleneck
l Architecture provides a degree of faulttolerance — if a processor
fails, the other processors can take over its tasks since the database
is resident on disks that are accessible from all processors.
n Examples: IBM Sysplex and DEC clusters (now part of Compaq)
running Rdb (now Oracle Rdb) were early commercial users
n Downside: bottleneck now occurs at interconnection to the disk
subsystem.
n Shareddisk systems can scale to a somewhat larger number of
processors, but communication between processors is slower.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Shared Nothing
n Node consists of a processor, memory, and one or more disks.
Processors at one node communicate with another processor at
another node using an interconnection network. A node functions as
the server for the data on the disk or disks the node owns.
n Examples: Teradata, Tandem, Oraclen CUBE
n Data accessed from local disks (and local memory accesses) do not
pass through interconnection network, thereby minimizing the
interference of resource sharing.
n Sharednothing multiprocessors can be scaled up to thousands of
processors without interference.
n Main drawback: cost of communication and nonlocal disk access;
sending data involves software interaction at both ends.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Hierarchical
n Combines characteristics of sharedmemory, shareddisk, and shared
nothing architectures.
n Top level is a sharednothing architecture – nodes connected by an
interconnection network, and do not share disks or memory with each
other.
n Each node of the system could be a sharedmemory system with a
few processors.
n Alternatively, each node could be a shareddisk system, and each of
the systems sharing a set of disks could be a sharedmemory system.
n Reduce the complexity of programming such systems by distributed
virtualmemory architectures
l Also called nonuniform memory architecture (NUMA)
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Distributed Systems
n Data spread over multiple machines (also referred to as sites or
nodes).
n Network interconnects the machines
n Data shared by users on multiple machines
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Distributed Databases
n Homogeneous distributed databases
l Same software/schema on all sites, data may be partitioned
among sites
l Goal: provide a view of a single database, hiding details of
distribution
n Heterogeneous distributed databases
l Different software/schema on different sites
l Goal: integrate existing databases to provide useful functionality
n Differentiate between local and global transactions
l A local transaction accesses data in the single site at which the
transaction was initiated.
l A global transaction either accesses data in a site different from
the one at which the transaction was initiated or accesses data in
several different sites.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Tradeoffs in Distributed Systems
n Sharing data – users at one site able to access the data residing at
some other sites.
n Autonomy – each site is able to retain a degree of control over data
stored locally.
n Higher system availability through redundancy — data can be
replicated at remote sites, and system can function even if a site fails.
n Disadvantage: added complexity required to ensure proper
coordination among sites.
l Software development cost.
l Greater potential for bugs.
l Increased processing overhead.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Implementation Issues for Distributed
Databases
n Atomicity needed even for transactions that update data at multiple sites
n The twophase commit protocol (2PC) is used to ensure atomicity
l Basic idea: each site executes transaction until just before commit,
and the leaves final decision to a coordinator
l Each site must follow decision of coordinator, even if there is a failure
while waiting for coordinators decision
n 2PC is not always appropriate: other transaction models based on
persistent messaging, and workflows, are also used
n Distributed concurrency control (and deadlock detection) required
n Data items may be replicated to improve data availability
n Details of above in Chapter 22
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Network Types
n Localarea networks (LANs) – composed of processors that are
distributed over small geographical areas, such as a single building or
a few adjacent buildings.
n Widearea networks (WANs) – composed of processors distributed
over a large geographical area.
©Silberschatz, Korth and Sudarshan20.Database System Concepts 5th Edition
Networks Types (Cont.)
n WANs with continuous connection (e.g. the Internet) are needed for
implementing distributed database systems
n Groupware applications such as Lotus notes can work on WANs with
discontinuous connection:
l Data is replicated.
l Updates are propagated to replicas periodically.
l Copies of data may be updated independently.
l Nonserializable executions can thus result. Resolution is
application dependent.
Database System Concepts, 5th Ed.
©Silberschatz, Korth and Sudarshan
See www.dbbook.com for conditions on reuse
End of Chapter
Các file đính kèm theo tài liệu này:
- ch20_1141_5807.pdf