Software Engineering
Software is the sequences of instructions in one or more programming languages that comprise a computer application to automate some business function. Engineering is the use of tools and techniques in problem solving. Putting the two words together, software engineering is the systemtic application of tools and techniques in the development of computer-based applications.
The software engineering process describes the steps it takes to deelop the system. We begin a development project with the notion that there is a problem to be solved via automation. The process is how you get from problem recognition to a working solution. A quality process is desirable because it is more likely to lead to a quality product. The process followed by a project team during the development life cycle of an application should be orderly, goal-oriented, enjoyable, and a learning experience.
Object-oriented methodology is an approach to system lifecycle development that takes a top-down view of data objects, their allowable actions, and the underlying communication requirement to define a system architecture. The data and action components are encapsulated, that is , they are combined together, to form abstract data types Encapsulation means that if I know what data I want ,I also know the allowable processes against that data. Data are designed as lattice hierarchies of relationships to ensure that top-down, hierarchic inheritance and side ways relationships are accommodated. Encapsulated objects are constrained only to communicate via messages. At a minimum, messages indicate the receiver and action requested. Messages may be more elaborate, including the sender and data to be acted upon.
That we try to apply engineering discipline to software development does not mean that we have all the answers about how to build applications. On the contrary, we still build systems that are not useful and thus are not used. Part of the reason for continuing problems in application development, is that we are constantly trying to hit a moving target. Both the technology and the type of applications needed by businesses are constantly changing and becoming more complex. Our ability to develop and disseminate knowledge about how to successfully build systems for new technologies and new application types seriously lags behind technological and business changes.
Another reason for continuing problems in application development is that we aren’t always free to do what we like and it is hard to change habits and cultures from the old way of doing things, as well as get users to agree with a new sequence of events or an unfamiliar format for documentation.
You might ask then, if many organizations don’t use good software engineering practices, why should I bother learning them? There are two good answers to this question. First, if you never know the right thing to do, you have no chance of ever using it. Second, organizations will frequently accept evolutionary, small steps of change instead of revolutionary, massive change. You can learn individual techniques that can be applied without complete devotion to one way of developing systems. In this way, software engineering can speed changee in their organizations by demonstrating how the tools and techniques enhance th quality of both the product and the process of building a system.
Data Base System
1、Introduction
The development of corporate databases will be one of the most important data-processing activities for the rest of the 1970s. Date will be increasingly regarded as a vital corporate resource, which must be organized so as to maximize their value. In addition to the databases within an organization, a vast new demand is growing for database services, which will collect, organize, and sell data.
The files of data which computers can use are growing at a staggering rate. The growth rate in the size of computer storage is greater than the growth in the size or power of any other component in the exploding data processing industry. The more data the computers have access to, the greater is their potential power. In all walks of life and in all areas of industry, data banks will change the areas of what it is possible for man to do. In the end of this century, historians will look back to the coming of computer data banks and their associated facilities as a step which changed the nature of the evolution of society, perhaps eventually having a greater effect on the human condition than even the invention of the printing press.
Some most impressive corporate growth stories of the generation are largely attributable to the explosive growth in the need of information.
The vast majority of this information is not yet computerized. However, the cost of data storage hardware is dropping more rapidly than other costs in data processing. It will become cheaper to store data on computer files than to store them on paper. Not only printed information will be stored. The computer industry is improving its capability to store line drawing, data in facsimile form, photo-graphs, human speech, etc. In fact, any form of information other than the most intimate communications between humans can be transmitted and stored digitally.
There are two main technology developments likely to become available in the near future. First, there are electromagnetic devices that will hold much more data than disks but have much longer access time. Second, there are solid-state technologies that will give microsecond access time but capacities are smaller than disks.
Disks themselves may be increased in capacity somewhat. For the longer term future there are a number of new technologies which are currently working in research labs which may replace disks and may provide very large microsecond-access-time devices. A steady stream of new storage devices is thus likely to reach the marketplace over the next 5 years, rapidly lowering the cost of storing data.
Given the available technologies, it is likely that on-line data bases will use two or three levels of storage. One solid-state with microsecond access time, one electromagnetic with access time of a fraction of a second. If two ,three ,or four levels of storage are used, physical storage organization will become more complex ,probably with paging mechanisms to move data between the levels; solid-state storage offers the possibility of parallel search operation and associative memory.
Both the quantity of data stored and the complexity of their organization are going up by leaps and bounds. The first trillion bit on-line stores are now in use . in a few year’s time ,stores of this size may be common.
A particularly important consideration in data base design is to store the data so that the can be used for a wide variety of applications and so that the way they can be changed quickly and easily. On computer installation prior to the data base era it has been remarkably difficult to change the way data are used. Different programmers view the data in different ways and constantly want to modify them as new needs arise modification , however ,can set off a chain reaction of changes to existing programs and hence can be exceedingly expensive to accomplish .
Consequently , data processing has tended to become frozen into its old data structures .
To achieve flexibility of data usage that is essential in most commercial situations . Two aspects of data base design are important. First, it should be possible to interrogate and search the data base without the lengthy operation of writing programs in conventional programming languages. Second ,the data should be independent of the programs which use them so that they can be added to or restructured without the programs being changed .
The work of designing a data base is becoming increasing difficult , especially if it is to perform in an optimal fashion . There are many different ways in which data can be structured ,and they have different types of data need to be organized in different ways. Different data have different characteristics , which ought to effect the data organization ,and different users have fundamentally different requirements. So we need a kind of data base management system(DBMS)to manage data.
Data base design using the entity-relationship model begins with a list of the entity types involved and the relationships among them. The philosophy of assuming that the designer knows what the entity types are at the outset is significantly different from the philosophy behind the normalization-based approach.
The entity-relationship(E-R)approach uses entity-relationship diagrams. The E-R approach requires several steps to produre a structure that is acceptable by the particular DBMS. These steps are:
(1) Data analysis
(2) Producing and optimizing the entity model.
(3) Logical schema development
(4) Physical data base design process.
Developing a data base structure from user requirements is called data bases design. Most practitioners agree that there are two separate phases to the data base design process. The design of a logical database structure that is processable by the data base management system(DBMS)describes the user’s view of data, and is the selection of a physical structure such as the indexed sequential or direct access method of the intended DBMS.
Current data base design technology shows many residual effects of its outgrowth from single-record file design methods. File design is primarily application program dependent since the data has been defined and structured in terms of individual applications to use them. The advent of DBMS revised the emphasis in data and program design approaches.
There are many interlocking questions in the design of data-base systems and many types of technique that one can use is answer to the question so many; in fact, that one often sees valuable approaches being overlooked in the design and vital questions not being asked.
There will soon be new storage devices, new software techniques, and new types of data bases. The details will change, but most of the principles will remain. Therefore, the reader should concentrate on the principles.
2、Data base system
The conception used for describing files and data bases has varied substantially in the same organization.
A data base may be defined as a collection of interrelated data stored together with as little redundancy as possible to serve on or more applications in an optimal fashion; the data are stored so that they are independent of programs which use the data; a common and controlled approach is used in adding new data and in modifying and retrieving existing data within the data base. One system is said to contain a collection of data bases if they are entirely separate in structure.
A data base may be designed for batch processing, real-time processing, or in-line processing. A data base system involve application program, DBMS, and data base.
One of the most important characteristics of most data bases is that they will constantly need to change and grow. Easy restructuring of the data base must be possible as new data types and new applications are added. The restructuring should be possible without having to rewrite the application program and in general should cause as little upheaval as possible. The ease with which a data base can be changed will have a major effect on the rate at which data-processing application can be developed in a corporation.
The term data independence is often quoted as being one of the main attributes of a data base. It implies that the data and the application programs which use them are independent so that either may be changed without changing the other. When a single set of data items serves a variety of applications, different application programs perceive different relationships between the data items. To a large extent, data-base organization is concerned with the representation of relationship between data items and records as well as how and where the data are stored. A data base used for many applications can have multiple interconnections between the data item about which we may wish to record. It can describes the real world. The data item represents an attribute, and the attribute must be associated with the relevant entity. We design values to the attributes, one attribute has a special significance in that it identifies the entity.
An attribute or set of attribute which the computer uses to identify a record or tuple is referred to as a key. The primary key is defined as that key used to uniquely identify one record or tuple. The primary key is of great importance because it is used by the computer in locating the record or tuple by means of an index or addressing algorithm.
If the function of a data base were merely to store data, its organization would be simple. Most of the complexities arise from the fact that is must also show the relationships between the various items of data that are stored. It is different to describe the data in logical or physical.
The logical data base description is referred to as a schema .
A schema is a chart of the types of data that one used. It gives the names of the entities and attributes, and specifics the relations between them. It is a framework into which the values of the data-items can be fitted.
We must distinguish between a record type and a instance of the record. When we talk about a “personnel record”, this is really a record type. There are no data values associated with it.
The term schema is used to mean an overall chart of all of the dataitem types and record types stored in a data he uses. Many different subschema can be derived from one schema.
The schema and the subschema are both used by the data-base management system, the primary function of which is to serve the application programs by executing their data operations.
A DBMS will usually be handing multiple data calls concurrently. It must organize its system buffers so that different data operations can be in process together. It provides a data definition language to specify the conceptual schema and most likely, some of the details regarding the implementation of the conceptual schema by the physical schema. The data definition language is a high-level language, enabling one to describe the conceptual schema in terms of a “data model”.
The choice of a data model is a difficult one, since it must be rich enough in structure to describe significant aspects of the real world, yet it must be possible to determine fairly automatically an efficient implementation of the conceptual schema by a physical schema. It should be emphasized that while a DBMS might be used to build small data bases, many data bases involve millions of bytes, and an inefficient implementation can be disastrous.
We will discuss the data model in the following.
3、Three Data Models
Logical schemas are defined as data models with the underlying structure of particular database management systems superimposed on them. At the present time, there are three main underlying structures for database management systems. These are :
Relational
Hierarchical
Network
The hierarchical and network structures have been used for DBMS since the 1960s. The relational structure was introduced in the early 1970s.
In the relational model, the entities and their relationships are represented by two-dimensional tables. Every table represents an entity and is made up of rows and columns. Relationships between entities are represented by common columns containing identical values from a domain or range of possible values.
The last user is presented with a simple data model. His and her request are formulated in terms of the information content and do not reflect any complexities due to system-oriented aspects. A relational data model is what the user sees, but it is not necessarily what will be implemented physically.
The relational data model removes the details of storage structure and access strategy from the user interface. The model provides a relatively higher degree of data. To be able to make use of this property of the relational data model however, the design of the relations must be complete and accurate.
Although some DBMS based on the relational data model are commercially available today, it is difficult to provide a complete set of operational capabilities with required efficiency on a large scale. It appears today that technological improvements in providing faster and more reliable hardware may answer the question positively.
The hierarchical data model is based on a tree-like structure made up of nodes and branches. A node is a collection of data attributes describing the entity at that point.The highest node of the hierarchical tree structure is called a root. The nodes at succeeding lower levels are called children .
A hierarchical data model always starts with a root node. Every node consists of one or more attributes describing the entity at that node. Dependent nodes can follow the succeeding levels. The node in the preceding level becomes the parent node of the new dependent nodes. A parent node can have one child node as a dependent or many children nodes. The major advantage of the hierarchical data model is the existence of proven database management systems that use the hierarchical data model as the basic structure. There is a reduction of data dependency but any child node is accessible only through its parent node, the many-to –many relationship can be implemented only in a clumsy way. This often results in a redundancy in stored data.
The network data model interconnects the entities of an enterprise into a network. In the network data model a data base consists of a number of areas. An area contains records. In turn, a record may consist of fields. A set which is a grouping of records, may reside in an area or span a number of areas. A set type is based on the owner record type and the member record type. The many-to many relation-ship, which occurs quite frequently in real life can be implemented easily. The network data model is very complex, the application programmer must be familiar with the logical structure of the data base.
4、Logical Design and Physical Design
Logical design of databases is mainly concerned with superimposing the constructs of the data base management system on the logical data model. There are three mainly models: hierarchical, relational, network we have mentioned above.
The physical model is a framework of the database to be stored on physical devices. The model must be constructed with every regard given to the performance of the resulting database. One should carry out an analysis of the physical model with average frequencies of occurrences of the grou pings of the data elements, with expected space estimates, and with respect to time estimates for retrieving and maintaining the data.
The database designer may find it necessary to have multiple entry points into a database, or to access a particular segment type with more than one key. To provide this type of access; it may be necessary to invert the segment on the keys. The physical designer must have expertise in knowledge of the DBMS functions and understanding of the characteristics of direct access devices and knowledge of the applications.
Many data bases have links between one record and another, called pointers. A pointer is a field in one record which indicates where a second record is located on the storage devices.
Records that exist on storage devices is a given physical sequence. This sequencing may be employed for some purpose. The most common pupose is that records are needed in a given sequence by certain data-processing operations and so they are stored in that sequences.
Different applications may need records in different sequences.
The most common method of ordering records is to have them in sequence by a key—that key which is most commonly used for addressing them. An index is required to find any record without a lengthy search of the file.
If the data records are laid out sequentially by key, the index for that key can be much smaller than they are nonsequential.
Hashing has been used for addressing random-access storages since they first came into existence in the mid-1950s. But nobody had the temerity to use the word hashing until 1968.
Many systems analysis has avoided the use of hashing in the suspicion that it is complicated. In fact, it is simple to use and has two important advantages over indexing. First, it finds most records with only one seek and second, insertion and deletions can be handled without added complexity. Indexing, however, can be used with a file which is sequential by prime key and this is an overriding advantage, for some batch-pro-cessing applications.
Many data-base systems use chains to interconnect records also. A chain refers to a group of records scatters within the files and interconnected by a sequence of pointers. The software that is used to retrive the chained records will make them appear to the application programmer as a contiguous logical file.
The primary disadvantage of chained records is that many read operations are needed in order to follow lengthy chains. Sometimes this does not matter because the records have to be read anyway. In most search operations, however, the chains have to be followed through records which would not otherwise to read. In some file organizations the chains can be contained within blocked physical records so that excessive reads do not occur.
Rings have been used in many file organizations. They are used to eliminate redundancy. When a ring or a chain is entered at a point some distance from its head, it may be desirable to obtain the information at the head quickly without stepping through all the intervening links.
5、Data Description Languages
It is necessary for both the programmers and the data administrator to be able to describe their data precisely; they do so by means of data description languages. A data description language is the means of declaring to data-base management system what data structures will be used.
A data description languages giving a logical data description should perform the folloeing functions:
It should give a unique name to each data-item type, file type, data base and other data subdivision.
It should identify the types of data subdivision such as data item segment , record and base file.
It may define the type of encoding the program uses in the data items (binary , character ,bit string , etc.)
It may define the length of the data items and the range of the values that a data item can assume .
It may specify the sequence of records in a file or the sequence of groups of record in the data base .
It may specify means of checking for errors in the data .
It may specify privacy locks for preventing unauthorized reading or modification of the data .These may operate at the data-item ,segment ,record, file or data-base level and if necessary may be extended to the contents(value) of individual data items .The authorization may , on the other hand, be separate defined .It is more subject to change than the data structures, and changes in authorization procedures should not force changes in application programs.
A logical data description should not specify addressing ,indexing ,or searching techniques or specify the placement of data on the storage units ,because these topics are in the domain of physical ,not logical organization .It may give an indication of how the data will be used or of searching requirement .So that the physical technique can be selected optimally but such indications should not be logically limiting.
Most DBMS have their own languages for defining the schemas that are used . In most cases these data description languages are different to other programmer language, because other programmer do not have the capability to define to variety of relationship that may exit in the schemas.