Sample System Administrator Manual
Bruce C. Gabrielson, PhD
Note: I put this together in 1993 from a number of sources after not finding a manual on the net that covered the security aspects of system administration and management. Please feel free to use it as you see fit.
This manual addresses system operations requiring management and control on a day-to-day basis. It is intended as a generic overview of system administrator responsibilities, including those activities that directly relate to the security of the server under their control.
System Administrator Responsibilities
System administrators (also sometimes referred to as system managers) have a significant responsibility in the networked environment. Control is a major issue. As more computing resources are added to a network, the job of maintaining control goes up. However, while budgets for traditional administration functions are limited but usually adequate, budgets to perform security functions are shrinking, and there are limited personnel available to address the multitude of problems. Government and businesses organizations normally consider Information System (IS) security an overhead cost, therefore a prime candidate for most budget cuts. With weak funding, it is increasingly more difficult to hire staff regardless of a growing workload and need or justification.
The job of just identifying intrusions and network weaknesses gets more difficult as sophisticated problems become much less conspicuous. In an environment with many networks and network types, it has become a complex issue to simply follow administrator guidelines provided by others with unique LAN problems of their own. Just to inform the network administrator for a particular LAN of all the potential problems or fixes possible, or even which known generic network problems might not be directly applicable, is a major undertaking.
Simple network administration isn't as effective as it once was. With the emergence of vulnerability issues, there is an entire new area of responsibility to consider. The job of the system administrator has become a combination of user control and administration, monitoring for sophisticated attacks, identification of what was tampered with, and finally implementing fixes to prevent future problems. Each of these responsibilities will be addressed herein.
Overall System Operation
There are seven functional areas supported by the systems administrator:
Administrative Control Procedures
Administrative control procedures are necessary to ensure correct operation of the system from a managerial standpoint. These procedures include:
The responsibility for administrative control procedures is typically shared by the system administrator and the local database administrator. This division of responsibilities is not absolute. For small systems, a single person may be charged with both these responsibilities. For very large systems, supplemental staff may be utilized to handle specific responsibilities, such as responding to user inquiries or controlling system access.
Responsibilities are generally delegated to staff based upon the size and utilization of a particular system. The responsibilities for the administrative control procedures for smaller or less active systems may be assigned to a single individual. A large or very active system may have two staff members involved, one to carry out the system management functions and one to support database management requirements.
Control of System Access
A primary system administrative responsibility is the control of procedures for system access in order to maintain data integrity, availability and confidentiality. The person in charge of system access should assign access capabilities based on an individual user's authority to perform specific application functions. System access should be granted based on job requirements. Some users may have authority only to enter or look at data, while others may need to update and delete data. Access to system outputs such as reports or statistical information may also need to be controlled.
Parameter and Table Specification
An important area of system control involves parameter specification and table maintenance. A parameter is defined as an individual variable or constant stored in a file, while a table is a file containing multiple parameters having similar characteristics. Parameters and tables that are external to individual programs are often used to vary a system's operation. These may also be used to control a user's functional access authority. Without the use of these parameters and tables, program source code would require modification every time a variable needed to be changed.
There are four benefits to using parameters and table files to control system performance:
Supervisory Control of the Production Process
Supervisory control of the production process refers to a system of administrative checks which ensure proper utilization and operation of the system. Supervisory controls are established to assist in monitoring the day-to-day operation of a system. They assure valid, proper entry and maintenance of data, accurate performance of input/output procedures, authorized user interaction and special processing requests. They are constructed to ensure that maintenance transactions are handled properly, verification and approval procedures are in place, and if necessary, monitor incident investigations and recovery. These controls help prevent the "garbage in, garbage out" problem typical of poorly developed systems. They act as the checks and balances for smoothly operating systems.
Supervisory controls can either be built-in functions of the software itself or documents designed for this use. Examples of these include job request forms, system manuals, procedural checklists, audit trails, and system performance/exception reports.
Archiving refers to the creation of data for historical reference purposes. Unlike the philosophy that governs the backing up of files, where a copy of a master transaction, or table file is made to ensure a copy is available daily if anything should happen to the original, the philosophy of archiving is to ensure that copies are created for long-term storage of data. Archiving procedures should be designed for systems of all sizes.
Computer Center Support
Since individual system operation is impacted by computer center operations, it is important to understand the activities and controls provided by the computer center. In the context of this guidance, the concept of a "computer center" relates to the software necessary to support a system application. Hardware peripheral support is covered in another section. Computer center support refers to the activities performed by the technical support staff within the computer center environment which help ensure each system will operate properly on a daily basis. This concept is normally associated with the operations of a major computer center. The elements of computer center support include:
Computer Software Operations
Computer operations encompass those activities that are carried out to maintain a viable system environment. The system environment is comprised of computer hardware, system software, and communication devices. System software includes the operating system and other required software, whether a traditional third-generation language (COBOL, FORTRAN, PASCAL), a package (LOTUS, SAS) or database management system (ADABAS, dBASE).
The user's responsibility for maintenance of the system environment is dependent on the size of the system used. Personal Computer (PC) users are directly responsible for maintaining a viable computer environment. All PC-based system components generally reside on or near the user's workstation and are under his/her direct control. The individual user is usually responsible for supporting the system environment which includes the following tasks:
Mainframe and most mini-computer operations are geographically removed from the user and centralized within a central location. These large-scale computer environments require both a means for the user to communicate with the computer resources and a staff dedicated to making these resources available to each user as needed. Typical system environment support performed by the computer central staff includes:
Mainframe Production Control
Production control operations refer to the activities that support the implementation and periodic processing of various systems. Typical production control activities for a large mainframe operation include:
Responsibilities for the individual production control activities vary with the size of the computer center. Individual users are directly responsible for the processing activities associated with their PC-based applications. Mini and mainframe based applications such as payroll, accounting, and national program systems are generally serviced by computer center staff.
Two production control activities that stand out in terms of their significance to the protection of an application system are backup and recovery, and disaster planning and recovery.
Backup and Recovery
Backup procedures should be designed to protect against any possible loss of data. Duplicate copies of data are made periodically to ensure that a copy of the generated work will always exist, even if the master copy of the data are damaged or destroyed. There is always a chance data can be lost either by human error, hardware or software failure or by catastrophic disaster. For smaller PC-based systems residing on individual workstations, it is the individual's responsibility for data backup. In most instances, simply creating a backup floppy disk and storing the backup in a location removed from the work station is sufficient.
For larger systems (mini-and mainframe), system developers should always assume the worst-case scenario when designing any backup system. This backup system will consist of procedures which are established to maintain and store recent versions of the information residing within a system.
Disaster Planning and Recovery
Disaster planning and recovery refers to the plan or set of procedures that is designed to counter any physical destruction or damage of the hardware resources including the server. To recover from an event which could effect multiple IS computing resources, the principal requirement is to recover the capability to perform equivalent processing in the shortest possible time period after the incident. The second requirement is to recover with the least economic burden. This recovery capability could take the form of either stand-alone processing or network resources and operations. Most incidences will not be large enough to require full implementation of a disaster recovery or an incident response plan.
Individual PC-based systems should consider standard data backup and recovery procedures normally available corporate wide as sufficient for own equipment disaster planning. Data that has been properly backed up onto other media and stored in a location removed from the system would be readily available for use on another system if necessary.
The third area of daily system operation involves user interaction. This subject consists of access techniques and operating procedures to be followed by a user of the system. The specific topics addressed in this section are:
The primary purpose of any support service is to raise the user's productivity by making them more comfortable with available technology, while concurrently improving their skill at using available software to update, access and manipulate information. In this regard, the person tasked with the administrative system management responsibilities, the system administrator should generally be the first point of contact concerning user-support issues. For very large systems where the number of calls is great, the first point of contact may be a member of a user-support team. For most smaller systems, the system administrator or his designee, will become the user's interface with a specific system or application. This will promote consistency and coordination of resources.
In order to provide effective user support, the system administrator should be responsible for:
In the case of large systems, the supporting organization should be cognizant of the mechanics of the system, since in many instances the system administrator may not be able to accommodate the large number of inquiries from users.
The system administrator may also assist users by providing necessary system documentation, forwarding advisories to disseminate system security information, and moderating user groups. The system administrator should also assist the user in acquiring appropriate training in such areas as use of the reporting functions of the system. This eliminates the need for users to contact the system administrator for assistance each time an ad-hoc report is needed.
System Access Techniques
System access techniques are the means by which a user interacts with an automated system. These techniques vary depending on the type of system, the type of terminal, and the input and output required. In order to ensure that users are able to utilize systems with ease, system access techniques must be developed and clearly and simply documented for users.
Step-by-step procedures for gaining access to a system must be developed and made available to the user. If several terminal types, communications or machine access methods are available, the user must be given instruction in their use.
System access techniques may vary among systems due to differing levels of sophistication of the intended system users. A system designed for infrequent or less technically oriented users often relies on extensive use of menus. Where the user is more technically oriented, a more detailed, and therefore more complicated, system interface utilizing direct system commands allows the user more control over system operation.
Data Entry and Update Procedures
Interaction with an automated system for data entry and update may vary depending on the type and function of the system. Data input and update schedules, procedures, and security requirements are governed by the volatility and sensitivity of the data being processed.
The timing of data input is tied directly to the nature of the system. For example, payroll system updates are driven by the bi-weekly payroll processing cycle. In this case, the timing of the update is critical. On the other hand, maintenance of archival and reference information may not be tied to a specific processing cycle. This allows the designated system administrator to establish an appropriate archival schedule.
Systems can have several levels of data entry or update approval. Some users may have authority only to enter or look at data, other users to update and delete data. The system administrator controls procedures for each type of access in order to maintain data integrity, availability, and confidentiality.
A system's analytical and report product is intended to support the user in performing his/her job and to meet organizational record keeping requirements. Software User's Reference Guides are used to describe these capabilities in such a way that the user can easily see the utility in the provided capabilities aswell as the direct correlation between the system's tools and the user's task requirements. Failure to make this link readily apparent will result in the underutilization or non-utilization of the system.
The timing of system outputs is relevant to the nature of the output formats. For example, outputs of systems which are event-driven usually generate periodic reports. Systems supporting less time-critical functions may have a more flexible reporting schedule. Systems often provide a number of options for obtaining information upon request. These ad hoc requests may allow the user to specify fixed or variable report formats and the characteristics of selected data. The more flexibility a system has in this regard, the more utility the system has for the user. The flexibility allows the user to undertake a focused analysis rather than having to glean desired information from a voluminous report.
The utility of a system is enhanced by having system output available on a variety of media, including paper, magnetic tape, or floppy disk. Flexibility in providing output enables a system to support multiple organizations, management levels, or equipment types. The additional availability of machine-readable output will enable users to incorporate information electronically into affiliated systems for further detail analysis or aggregation.
The delivery process by which users receive outputs is important for all but single-user PC applications. The opportunity to specify the output location, such as a local or remote printer or terminal screen, is a useful system function. Many systems have a wide range of options of this type, including transmission of system output either within the Navy or nationwide, which is helpful for transmitting administrative data between locations.
Training is a critical element for the effective operation of any application. A variety of training options should be available, from traditional classroom instruction to users models and computer-based training. The system administrator must determine which methods are most applicable to meet different user requirements. One of the critical areas of training is computer security. The system administrator must maintain close coordination with its IS security organization to ensure appropriate computer security training is both available and provided to users of the equipment. The system administrator must also receive annual security training in order to keep current on emerging security issues related to the equipment they manage.
Users with varying degrees of skill and experience may determine the extent to which the organization can provide effective initial and follow-up training. This acknowledgment of skill level will allow the training program to more accurately meet user needs.
To develop an effective training program, the person or group responsible for end-user training will need to establish guidelines and determine who needs to be trained, what applications or systems will be taught, and what training method will be used. The following procedures will help maximize the benefits of any user training program;
The system administrator should survey managers and supervisors to identify the organization's goals, objectives, programs and projects. This uncovers potential needs and provides a strong direction for and commitment to the training.
The best training method depends on user's needs, learning objectives and available resources. Users are often primarily concerned with what they will need to know in order to perform their jobs. They are often unwilling to invest long study hours, preferring to focus on training that directly relates to their day-to-day experience. Any training program should address this perspective in order to provide effective user training.
One efficient way to design a user training program is to categorize users by organizational or skill level. The four organizational levels of users are executive, managerial, technical and administrative. Each level of user requires a different focus and approach to training. Executives and managers usually have little time available for training. Courses for these users should emphasize payback and results from a management perspective. Technical users typically adapt to new technology readily and welcome the introduction of new tools and methods. This category or user is usually more receptive to experimentation and is willing to explore new techniques during a training program. Administrative users are interested primarily in developing skills for their specific work applications. This may involve training on specific applications, such as word processing, presentation graphics, spreadsheets and the applicable computer equipment. Another important aspect of training is the need for staff retraining precipitate by extensive changes made to the system software during a maintenance cycle.
Follow-up after training is essential. Success is measured by whether users continue to employ and expand the concepts and skills acquired during the training sessions. An analysis should be made to determine which end users are benefiting most from the training, which are not benefiting, and most importantly, why not. This information can then be factored into restructuring the future training curriculum.
Application Software Operation
The operation of the application software is the responsibility of the System Operator or operations personnel. These personnel are responsible for systems operation activities, including operating the hardware and maintaining the stability of the application software for the user community.
The following specific activities of application software operation are described in this section:
During system implementation, internal files and tables are initialized with a baseline of data and operational parameters. In some cases, this initial implementation is sufficient to establish and support operations for the life of the system. In other cases, it may be necessary to re-initialize the system files and tables at the start of a new operating interval . A step-by-step process for initialization, including a list of the required input parameters and a description of the update procedures must be developed and documented. The system administrator will have the responsibility for defining and specifying system parameters and re-initialization requirements to the system operators in the final documentation.
Error Detection and Recovery
Among the responsibilities of the system administrator personnel are error detection and recovery procedures. A variety of system failures can result in system malfunction. Such failures occur as a result of data errors, program errors, or equipment malfunctions. The system operator can determine the type and seriousness or errors, if any, that have occurred as a result of a system failure by reference to system error messages and other diagnostics.
Program or data errors, program error messages are displayed on the operator's terminal screen or user's screen; these are developed as part of the detailed system design phase of the software life cycle.
Equipment errors/failures -- hardware malfunctions produce error messages either displayed by the hardware itself or by the mainframe operating system,if applicable. These are normally described in the operating guide supplied with the hardware at time of purchase and are often reiterated in the
System Communication Interfacing
System interfacing procedures become more critical in system operations as increasing numbers of systems interface for data exchange and storage purposes. Procedures for performing or maintaining system interfaces should be documented in a Software Operations Document and/or System Administrator's Manual. The major topics that need to be considered during the preparation of operating documentation of any system are prepared during System Design, Development and Implementation and updated during System Operations and Maintenance, as needed.
The Operational Baseline represents the completely implemented and tested software system. It is the basis for future maintenance changes and enhancements. It is established following a successful Operational Test and Evaluation Review and after it has been placed in production and/or turned over to the user .
Disaster Recovery Plan
Computer hardware consists of the computer and an array of input and output devices including printers, tapedrives, terminals, and disk drives. Communication devices include the modems and communication lines that allow access between computers for sharing data and processing capabilities.
In a Local Area Network(LAN) environment, a LAN Administrator is assigned the responsibility of operating and maintaining the LAN. The LAN Administrator's responsibilities include overseeing or performing the following:
Network security is one of the most important responsibilities facing a system administrator. In this roll, the administrator is confronted with two primary problems: the disgruntled employee and the outside cracker. A disgruntled employee can set up an internal bypass for later use while crackers continually try to break into the network from outside. There is also the potential problem created by the legitimate employee who bypasses security protection so he can work at home via modem or a user who has a portable computer so he can work while on the road.
The threat of system hackers is an ever present thorn in the side of system administrators. In recent years, system crackers, pirates, etc. have seen an increase in their ranks, with crackers by far the most dominant. Part of the reason for this increase is that access to the internet has become a part of everyday life for most college students, offering a prime target for those with an interest in cracking for a few hours of stimulating entertainment.
The destructive cracker wants to insert viruses, Trojan horses, time bombs, and worms. The most common cracker wants to use your system for free or find out if you have any interesting software he can acquire. The directed cracker wants to find out everything he can about your operation and business for personal use or sale to others.
Installing System Patches
Nearly all operating systems have bugs that can be used, either maliciously or accidentally, to defeat system security measures. When bugs are identified, they should be fixed as soon as possible. What can a system adminiatrator do to increase the level or security on his or her system? Often, most of the security problems on a machine are easily fixed; problems such as bad passwords, incorrect machine setup, default system parameters, etc. Since crackers usually have a list of things they look for, these same problem areas should be looked at in an attempt to correct the deficiencies.
It is recommended that a system administrator investigate all sources of vulnerability information about their particular operating system. Once the security profile is understood, every patch recommended for the operating system should be installed. Since some patches restore default configurations, it's important that the most recent patches are in place before any further security precautions are taken.
As an example of a patchable vulnerability, there are some known bugs in the ftp daemon, ftpd, that is dated before December 1988. There are also known bugs in many versions after that date, as well. The system administrator should make sure all patches are installed before running ftpd.
Removing the vulnerability once an incident has occurred is difficult, and requires a complete understanding of how the breach occurred. Sometimes an incident requires the system administrator to remove all access as soon a possible, and then restore operation to users in stages. However, this will notify the cracker that he has been discovered.
Recording System Defaults
When a computer is accessed by an unauthorized user, there is a possibility that system files and data will be changed or deleted. The system administrator needs to check the computer for deleted files, trojan horses, hidden files, etc., and then restore the computer to its original state. To do this, the system administrator must first record the state of the computer before it is available for general use.
For computers that don't come with a copy of the operating system, the first thing the administrator should do is a complete backup of all the system software. If the computer has a tape drive or an optical disk, this can be done immediately; otherwise, the administrator should do the least amount of system configuration required for a backup to be made.
Recording SUID and SGID Programs
Before any software is added to the basic operating system release, the system administrator should check for SUID and SGID programs. If unauthorized access occurs, frequently the intruder will leave a program that enables privileged re-entry. The list of SUID and SGID programs should be stored both on and off the computer. The version on the computer can be used by a daily cron job to check for changes, while the version stored off the computer will ensure that even if root access is acquired, a record of the system's original state is available.
Check and Record Permissions on all Device Files
By changing the permissions on device files, an unauthorized user can gain access to devices, using this access to change files, impersonate another user, or listen in on conversations. World writable files and directories allow unauthorized users to add, delete or change existing configurations. There are, however, some directories that must be world writable.
User Password Control
A vulnerable computer can jeopardize the security of your entire network. The first line of defense against intruders is good password management. Most unauthorized use of systems happens through weak passwords. It is important, therefore, that the system administrator ensure that each user choses a password that has a mixture of letters, numbers, and symbols, and that makes use of the full eight character UNIX password field. Insure that the password isn't a word or combination of words in the dictionary. '$2much4U' is an example of a good password.
Users will base their idea of a proper password on the one that the system administrator first assigns to them. Users should be told to change their password immediately upon receipt of an account. The system administrator should record the first password on the user's account sheet and check it several days after the account has been issued to be sure that it has been changed.
Passwords and Shells on System Accounts
The system administrator should check the system password file to ensure that all accounts have passwords. Many vendors ship their computers with no passwords on the system accounts. System accounts such as bin, lp, and sync should have a '*' for the password field. No account should be left without a password.
Also, the system administrator should check to see if the computer comes with any passwords already assigned. Some vendors give default passwords to system accounts. Since anyone who has the same type of system knows what the default passwords are, the passwords should be changed immediately.
Every account needs to have a shell assigned to it. Most administrative accounts should have /bin/false as the shell, which would disallow crackers from gaining shell access using obscure system holes.
Avoid '.rhosts' Files
The biggest threat to the computer after weak passwords is the use of '.rhosts' files. Every user, including root, can create these files. The '.rhosts' file allows a user to log into the computer from a remote site without using a password. Since there is no guarantee that another system is secure, it is very dangerous to have these files on the computer. Users should be instructed not to use '.rhosts' files and the system administrator should write a cron script to remove '.rhosts' files.
Expire Inactive Accounts
Computers with large numbers of users tend to have accounts that become inactive. The beginning of a new fiscal year often brings changes in who is using the computer, as users' funding sources change. The system administrator needs to be sensitive to those accounts that become inactive, and disable them by replacing the password field in the /etc/password file with an '*'. If the user has left important data on the computer, eventually they will contact the system administrator to make arrangements to retrieve the data. Once this data is retrieved, the account should be removed.
Restrict Root Login to the Console
The ability to login to the root account should be restricted to the console. Anyone not at the console should have to use 'su' to become root. When system administrators use su to become root, they should not simply type 'su'. They should first unalias, and then use the complete path name, i.e. /bin/su or /bin/su.wheel. This prevents the system administrator from falling victim to several different types of attacks. Restriction should also be placed on the ability to 'su' to root by only allowing the ability to 'su' to root to members of the wheel group. This feature appears only on newer BSD systems, where it is the default.
The system administrator should not establish accounts for guest usage. These accounts, often appearing as an account with login guest and password account, are common holes exploited by unauthorized users. Every user of the computer should undergo the same security procedures, receive the same security briefing, and be held accountable to the same standards. When users are finished using the computer, their accounts should be removed from the password file.
NIS presents several problem areas because of design flaws in the NIS code. There isn't much that can be done about the design flaws, so system administrators use NIS at their own risk. If the system administrator does decide to run NIS, verify that the use of the '+' is in files on the clients, not on the server. Also make sure that you can't log into the clients as '+' with no password; if you can, they need a '*' in their password fields.
Incident response planning (break-ins and asses loss), virus control, remnance control, software piracy control, and software write protection control are all part of the responsibilities of the systems administrator's job. Major natural disasters including earthquakes, tornadoes, floods, fires, etc. can create any of a number of IS operational incidences. Backup responsibilities were discussed in a previous section of this manual.
With such a varied range of major and minor incidences to address, some recovery strategies can be applied to all incidence types, while other strategies must be incidence specific. The system administrator must understand what steps to take depending on the nature of the incident effecting his or her system.
Network Penetration Control
The system administrator must immediately notify the appropriate IS security organization when either of two types of incidences are determined, (1) a network security breach and (2) the notification through various sources that a network vulnerability has been identified. There should also be in place a reportable process when virus attacks or denial of service attacks are identified.
System administrator follow-up after an incident should include an assessment of the factors that allowed the intrusion to occur, updating the security policy which addressed the incident, and additional education for users and administrations.
The relationship between system administrator and IS security organization extends well beyond the notification process and follow-up. The system administrator must actively communicate with the IS security organization on a regular basis, and rely on this organization as a primary source of information concerning emerging vulnerability issues, system security test and evaluation, security training, and immediate response support.
Concerning security system test and evaluation, the IS security is usually the organization most directly responsible for proactive security testing on the system administrator's equipment. Proactive security actively deals with the conditions and environment the computer system is operating in. Remote proactive security testing is a process whereby computer security personnel attempt to gain access to remote systems under their control. The process is much more involved than simply a sweep for information. The purpose is to uncover known system problems that were left uncorrected or identify new problems that were previously unknown. In using a proactive approach, the security office runs a series of tests against remote machines with the purpose of discovering known holes that have not been corrected.
Proactive testing benefits the system administrator in that it gives him or her information that may not be discovered by looking at system files. Computer systems look very different from the outside, and a new perspective goes a long way toward clearing up problems, especially when that perspective is the same as the enemies. An effective proactive testing program also gives the remote system administrator a feeling of security when testing is no longer effective against his or her machines.
Authorized computer security teams are required to use a dedicated evaluation effort, including whatever tools are available, to test the vulnerability and security of their systems. The advantages of knowing what can happen ahead of time far outweigh the results of what could happen if a potential adversary gained access to a protected or sensitive system. Finding a network's potential vulnerabilities, hidden viruses or any other intruder problem areas is difficult, but once found, these threats to a system can usually be eliminated expeditiously.
This manual described four major functional operations that are necessary for the effective and efficient day-to-day management and control of a system. First,the administrative control procedures which are preformed by a system administrator or a Database Administrator include authorizing users to access the system, specifying system variables through parameters and tables,supporting users by responding to inquires, and supervising the production process. Second,additional operational responsibilities which support the system are performed by technical personnel at the appropriate computer center. These functions should be understood and supplemented for each system, as needed. Third, the system must be responsive to user needs through proper user interaction including allowing sufficient system access means,ensuring accurate data through controlled data entry and update procedures, providing useful analysis and reporting products, and structuring effective training for users.
Finally, the software must be operated properly, ensuring initialization of the system,error detection and recovery, and proper interfacing with other systems. Procedures for operating the software and reference information for users are contained in support documentation.
SYSTEM ADMINISTRATOR RESPONSIBILITIES