Practical Realities of Computer-based Finding Aids: The NARS A-l Experience
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The American Archivist / Vol. 42, No. 2 / April 1979 167 Practical Realities of Computer-based Finding Aids: The NARS A-l Experience ALAN CALMES THE NATIONAL ARCHIVES and Records System Study Service (NARS) A-l System is a com- At first, the NARS program analysts puter-assisted procedure for compil- asked whether gaining administrative ing all the series level inventories of control over record groups and com- the National Archives record groups piling series descriptions required the into one master file. Two main pur- creation of a new system. The A-l sys- poses of the system are: (1) to provide tem study consisted of an analysis of the Office of the National Archives existing procedures for describing and with administrative control over the controlling archival records. Program record groups (allocation to custodial analysts determined the adequacy of units and quantity control); and (2) to the various procedures, compiled a list compile all series descriptions into one of problems, identified types of infor- machine-readable file according to a mation of interest to archivists and re- standard format and a hierarchical ad- searchers, and assembled a variety of dressing scheme. The NARS A-l Sys- finding aids. In the process of evaluat- tem came about after an in-depth ing existing series descriptions, the background study between 1972 and program analysts identified strengths 1974 found the NARS administrative and weaknesses of each and outlined and descriptive programs inadequate methods to bring all the series descrip- for controlling archival records.1 The tions together into one system. It was design of the A-1 system took place in recognized that the format (layout of 1974-75, and the system went into information and the kinds of informa- production in 1976.2 tion) varied widely from one series de- 1 Claudine Weiher, "Control and Description of Records in the National Archives," unpublished, NARS-NAA, 4 vols. (1974). 2 Alan Calmes, "NARS A-l System Documentation," unpublished, NARS-NNB (15 August 1978).
168 The American Archivist—April 1979 scription to the next and that a stand- efit analysis evaluated the relative ard format ought to be adopted. In the worth of manual processing as com- final analysis, the study envisioned a pared to automated processing and system which would incorporate desir- concluded that costs would be compa- able new features, such as format con- rable. However, this was a moot point trol, into the existing system. since creation of the file would repre- The main consideration then be- sent about 60 percent of the total cost, came how to assemble and compile the and the Archives administration could formatted series descriptions into a not hire a large enough number of master file. Subject indexing was con- clerk typists and provide them with sidered impractical because a dictionary enough office space for typing and \ ,of terms would have to be developed maintaining an active file of complete and applied systematically to all series inventories of all the record groups. descriptions. It would make the previ- Therefore, to assist the manual opera- ously described series descriptions in- tions and reduce the need for addi- adequate and would require a consid- tional personnel and office space, erable amount of additional time for NARS management looked toward au- all future series descriptions. Indexing tomation. The A-l system study, which would require that an archivist identify cost $70,000,3 concluded that a com- appropriate index terms for each series puter-assisted system mainly for the description. This would slow down the purpose of text editing—sometimes decision-making process during series called "wordprocessing"—should be description writing. It was estimated implemented instead of a computer that such a process would triple the se- centered system designed for infor- ries description processing time. The mation retrieval by subject, which A-l system was supposed to accelerate would require the use of index terms. the production of series descriptions, not retard their production. Further- System Design and Implementation more, the A-l system was thought of as a means of gaining administrative System design of the A-l system in- rather than intellectual control over volved two different activities: the the records. The analysts recom- preparation of input forms, and com- mended that subject retrieval receive puter processing. The following de- serious attention only after the prob- scription of the A-l system will not lems of administrative control were deal with the preparation of input solved. forms.4 Computer processing con- Initially, automation was seen as the sisted of the conversion of statistical second of two alternatives; for at first and descriptive data into machine-read- the problem was conceptualized in able form, the manipulation of the ma- terms of manual processing. The ana- chine-readable file, and the produc- lysts estimated the time needed to re- tion of computer printouts. Because produce old series descriptions accord- the A-1 system study called for a com- ing to a standard format and to process puter assisted system rather than a new series descriptions. The cost-ben- computer centered one, batch processing 3 It took four and one-half man-years to complete the system study. 4 GSA Form 6710A, Change of Status Record, was designed for the collection of series description information. Form design was carried out independently of the system design.
The NARS A-l Experience 169 was viewed as the main mode of oper- record contains information describ- ation. ing the record group as a whole; for During the design stage of the A-l example, an organizational history. system, system analysts first consid- Level 9 is the subgroup which is most ered the purposes and objectives of the embedded in the record group's hier- computer printouts. NARS manage- archical structure. Subgroups thus ment decided that the traditional represent such organizational struc- NARS inventory format should be the tures as "office," "division," and main output product. Batch process- "branch." Levels 10, 11, and 12 are ing of the record group descriptions in series, subseries, and sub-subseries, re- inventory format required the ma- spectively; they may be immediately chine-readable file to be arranged se- associated under a record group as a quentially according to a hierarchical whole (level 1) or with any subgroup numbering scheme (an addressable (levels 2-9). Figure 2 illustrates the control number) which reflected the operation of the control number illus- placement of archival series within the trated above. originating agency's organizational The resulting control number shown and/or functional structure. The ad- in Figure 1 identifies the address for dressable control number was a com- the series description of "LETTERS bination of record group number, sub- RECEIVED." record group numbers (eight possible In order to avoid the problem of layers), series number, subseries num- variable length records,5 all the data be- ber, and sub-subseries number. longing to the same hierarchical ad- In Figure 1, the term "Level" iden- dress are linked sequentially by a set tifies the depth of the hierarchically lo- identification code (i.d.) and subset cated record. Level 1 represents the rec- number. A set is a repeatable line of ord group number level. A level 1 the same type of information, such as Figure 1, Control Number Level 1 2 3 4 5 6 7 8 9 10 11 12 RG SUBGROUPS SERIES SUB-S SUB-SUB-S LEVEL CON- A B C D D E F H TROL No. 127 1 1 1 1 1 1 0 0 18 0 0 10 5 Variable length records require special handling called "list-processing," which involves a com- puter check for the end of each record each time it processes the file. Multiply one variable length record times two hundred thousand records to make up the entire file, and it is easy to see that the computer's job would have been slowed down considerably and been costly. For an example of list- processing see Alan Calmes, "A PL/I Free-Field Handling System," in Historical Methods Newsletter 8 (December 1974): 39-47.
170 The American Archivist—April 1979 Figure 2, Record Hierarchy RG 127 Records of the U.S. Marine Corps A. 1 Textual Records B.I Headquarters U.S. Marine Corps C.I Office of the Commandant D.I General Records E.I Correspondence S.I REGISTER OF LETTERS RECEIVED S.18 LETTERS RECEIVED in Figure 3. Each line of information is may be repeated up to 999,999 times one subset of a periodic data set, some- by means of the subset number. Figure times called a "repeating group." In 3 outlines the descriptive narrative and the control number example above, a location register part of the computer set i.d. such as "05," may be attached file format. to the end of the control number to in- A line of text narrative may exist dicate that the following information alone or be followed by an indefinite consists of one narrative line. Subset number of additional lines up to number one attached to the set i.d. in- 999,999. For output-page-width rea- dicates that the information represents sons, each line of text consists of a line number one of a paragraph. A set maximum of seventy-six characters. Figure 3, File Format Control # + Set i.d. + Subset # + DATA Repeating (full control #) 05 000001 (narrative line) group (full control #) 05 000002 (narrative line) (full control #) 05 000003 (narrative line) Repeating (full control #) 06 000001 (location) group (full control #) 06 000002 (location)
The NARS A-l Experience 171 X} - T> cm — « A C • L CT I c 3 0 c 1 gne octa e Cl c L CT « c O 3 i o "O o o < c c e •* i 9U • -* •1 u - - 4> * X. *n « 41 •Jl ) 4J o > M . 4> HI u • •> c r [ c t, D 3 O • c rve ona oo X •P 1) L u in > « t. - O D r m 4J : •* > t * 0 < E m • * - J ^ r cer fee l- > 1 c o o c o : j t. C o 4i i i ti — 15 X o o 1 L T3 t- o « M o •• » ID
172 The American Archivist—April 1979 Within each subset of a set the data oc- retyping whole series entries after each cupies fixed fields of descriptive infor- change. (An archivist submits a draft, mation. Most fields contain uncoded it is keyed into the system, and a print- information. A limited amount of cod- out is returned to the archivist for re- ing reduces the length of some fields; vision. Input operators key-in the for example, a "2" might equal the changes; a batch update cycle replaces phrase "still pictures." or deletes, or inserts data.) With this The General Services Administra- method, the design and implementa- tion (GSA) insisted that NARS obtain tion of which cost about $60,000,6 an- computer services from GSA's Office alysts estimate that it will take the Ar- of Data Systems. Furthermore, GSA's chives twenty years to input systems analysts had to design systems approximately 200,000 series descrip- more from the point of view of their tions of the previously described, the machinery than according to the needs undescribed, and the newly acces- of the system. Therefore, in order to sioned archival records, and be up-to- avoid the limitations imposed on the date with the latter. system by GSA and to accomplish as Hardware much of the A-1 system requirements at the National Archives as possible, The most attractive aspect of com- NARS decided to buy a mini-computer . puter assistance, from the point of view to handle data input and to handle the of the A-l objective, is fast, accurate relatively small statistical file on each data-entry (the process of typing infor- record group. Because the mini-com- mation and creating machine-readable puter is programmable, part of the data).7 About 60 percent of the entire A-1 system is handled on-line. The Of- system's estimated cost consisted of fice of the National Archives maintains data-entry. NARS looked for hardware, statistical control of the record groups i.e., computer machinery, for an in- (by quantity allocation to custodial house, data-entry facility to carry out units) by means of the mini-computer; some limited programmable process- batch processing of the entire inven- ing, especially data verification, edit- tory file for the compilation of series ing, and reformatting. GSA's Office descriptions, however, takes place at a of Procurement studied the problem large computer installation controlled of whether to rent or to buy. On the by GSA. basis of the A-l long-term needs, the The systems analysts designed the office decided it would be cost-effec- series description part of the system tive to buy.8 Over a twenty year pe- around batched text editing and mag- riod, the annual cost of the in-house, netic tape storage. Batched text editing mini-computer, data-entry equipment, is designed to eliminate the need for plus maintenance, will be $10,350 per 6 This figure does not take into account all the GSA, Office of Data Systems, costs which could only roughly be tabulated because GSA did not bill NARS directly for all costs. 7 Reliable sources of information on mini-computers may be found in Data-Entry Awareness Reports, published monthly by Management Information Corporation, 140 Barclay Center, Cherry Hill, NJ 08034; and Datapro, special reports published by Datapro Research Corporation, 1805 Underwood Boulevard, Delvan, NJ 08075. 8 NARS bought a Four Phase, Data IV/90 computer for $147,000. It included a 96K byte processor/ memory, 67.5 million byte direct access disk, 1600 BPI tape drive, 300 lines per minute printer, and eight CRT/KEY stations, capable of displaying and processing both upper and lower case alphabetic
The NARS A-l Experience 173 year. Rental would have been $41,000 of a range, and for comparisons to per year. stored values. Automatic field genera- tion for repetitive and incremented Software fields, automatic insertion of con- stants, automatic duplication of data, The A-1 system uses data-entry soft- and a key-verify mode10 with immedi- ware (a collection of computer pro- ate, direct access correction are other grams) that came with the hardware.9 required features. In summation, the The software allows the input operator hardware/software combination pro- to see projected on a television-like vides for quick and accurate data-en- screen questions (prompts) which ask try. for specific information to be read off the input form and typed into the computer. The typed data appear on Data Entry Process the screen and the software checks to A key-entry operator takes the see if the data meets a set of criteria source document (GSA Form 6710A, previously programmed into the ma- Change of Status Record) containing chine. If the data fail the validation- archival descriptions of records, and check, the machine sounds an alarm. keys the data into a television-like With the machine in the "search" screen, field by field. Some fields are mode, the operator makes the com- repetitive and do not require rekeying. puter locate a particular record stored A format control program edits the on disk by a unique identification code, data—right-justifies and zero-fills nu- field, or logical search strategy, after meric fields—and restarts the display which the operator inserts, corrects, for the next source document. The changes, or deletes one character at a work-cycle consists of data-entry, veri- time. Because several input operators fication, and batch input to the master may be doing a variety of tasks, the data-base. software/hardware combination has to With four input operators, the A- allow for simultaneous data-entry into system averaged 6,000 finished series different formats and into the same descriptions per year during its first format, while background processing three years of production. There was takes place and data is being trans- an average of ten lines per series con- ferred to the tape-drive or printer. Au- sisting of an average of fifty keyed tomatic editing is another desirable characters per line. Each series, with feature—checking for alphabetic or identifying subrecord group titles and numeric characters, checking for limits text, averaged 500 characters. Thus characters. A short-term five year project would be better off with rented equipment. Due to rapid changes in computer technology, especially in the mini-computer field, and the decreasing cost of computer parts, it may be best to rent for two or three years and then obtain a new contract or a new mini-computer. 9 Four Phase provided a wide variety of software including data-entry formatting, editing, and re- formatting. It also provided COBOL programming, tape read-and-write instructions, and format in- structions to the printer. Four Phase trained the data-entry operators and the supervisory computer operator, and distributed manuals. 10 Key-verify mode consisted of a second typing. The input operator typed over the previously typed data which was disguised by scrambled letters; and, when the keyed character disagreed, the machine alerted the operator that an inconsistency existed. The operator would than sight verify the particular character in question and correct the mistake.
174 The American Archivist—April 1979 ten lines times fifty characters, times gram creates a new, updated, sequen- six thousand series, equals three mil- tial file on tape for storage. lion characters. The data on the correction/update tape are replaced and/or merged into Data Base Maintenance the addressed sequential slots of the The master machine-readable file of "father" data-base tape. A new, re- record group inventories, stored as a vised tape called the "son" tape, in- data base on magnetic tape at a large cluding the revisions and additions, computer installation, receives a becomes the latest revision of the data- monthly update of new and revised base. During the process, the com- data in batch processing mode. As il- puter also produces printout reports lustrated in Figure 5, the update trans- of the actions taken. The old data-base actions cause the data-base tape to be tape remains unchanged as a "father" read into temporary disk storage. The tape. Backup tapes go back to the insertions, replacements, and deletions "grandfather" generation. A "great take place on the disk. Then the pro- grandfather" tape is recycled into a Figure 5, Flow Chart "FATHER" , MASTER I TAPE J INPUT TEMPORARY COMPUTER DISC MANIPULATION STORAGE OUTPUT NEW "SON" MASTER PRINTOUT TAPE
The NARS A-l Experience 175 new, current, master, data-base tape in the Archives building, however, (the "son" tape). After an update eventually it may be possible to main- cycle, a listing of valid and invalid tain the data in-house and to have on- transactions instructs the input opera- line, control number access to each se- tors to make corrections. For example, ries description. a new line of narrative description might be entered with a used subset number. This transaction would be re- Output jected as an invalid transaction, and For the most part, output means the operator would have to change the printout pages and microfiche. The subset number to an unused number. batch mode computer programs at the The transaction would have to be GSA-controlled computer produce reentered with the next update cycle. printout pages or place print-page im- This error could be avoided by the op- ages on magnetic tape. The Archives V erator's check of the current, master personnel may either print the mag- J\ «file. The master file, however, is so netic tape on paper at the in-house ™ I large that printouts are impractical and mini-computer or hire a service con- ^ 1 microfiche must be used instead. The tractor to produce computer output ^ I example of the error above results microfiche (COM). The entire file is from misreading one of the thousands placed into inventory format about of lines of data identified by the forty- every three months. Microfiche is made nine character control-number; mis- twice a year. Smaller reprints are run reading one of the digits causes the locally by the mini-computer, where a mistake. great deal of diversity of format is pos- The A-l data base will grow in size sible. Together, the various forms of to about one hundred million charac- output cost about $5,000 per year, in- ters over the next twenty years. This cluding paper and microfiche. will require the machine-readable file to be segmented by record group. It will become inefficient to keep the en- tire file in one sequential series of tapes Personnel which will have to be read into the An overwhelming part of the A-l computer and put on disk. Instead, it system involves people. An in-house will be better to have a separate tape task force of program analysts carried for each record group or group of rec- out the initial study. The system re- ord groups. Segmentation of the file ceived a wide range of participation will require some program changes. and input from archivists throughout This will have to be done by GSA pro- the National Archives. Several system grammers, however, and will consti- analysts worked on the system design. tute an extra cost and some trial and Computer programmers and opera- error programming and testing; this tors wrote programs for the GSA-con- will result in some "down time." Fur- trolled computer. At the Archives, four thermore, the GSA programmers will key-entry operators and one supervi- have to make constant adjustments to sor are carrying out the task of coding the programs after each hardware en- and keying-in the data. hancement and after each new soft- After three years experience build- ware release. With the development of ing the machine-readable file of series the mini-computer base of operations descriptions, the rate of production
176 The American Archivist—April 1979 appears to be very slow.11 An average Total Operating Costs rate of 375 characters in final form per 1 hour is based on the production of six thousand series descriptions per year The Archives personnel cost of $44,500 annually amounted to nearly 60 percent of the total annual costs. by four input operators during a two The cost of a series entry was com- thousand hour work-year. The rate of puted by dividing the six thousand se- production is low because the data are ries placed into the system each year by keyed twice (once entered and once the total annual cost, producing $12.39 verified); furthermore, the update per series. Based on an average of 500 cycle requires re-input of invalid trans- characters per series, each final char- actions each time a line of data fails to acter costs the government almost two go on the machine-readable file. The and a half cents ($.0248). Each char- A-l data-base is constantly being acter was keyed at least twice (entry changed; some series descriptions are and verify) and some three or four revised several times a year. Finally, times for revision and correction. personnel turnover and training also slow the rate of production, as does computer down-time. Comparison of the A-l Experience with the A-l Study Projections The A-l personnel costs were $44,500 per year, broken down as fol- The A-1 experience costs were fairly lows: $8,000 per entry operator, times close to the A-l study projected costs.12 four operators, equaling $32,000 per The total estimated costs, though year. The supervisor cost $12,500 per aligned somewhat differently from ac- year. tual costs, are shown in Figure 7. Figure 6, Annual Cost System Study* $ 3,500 System Design and Implementation* 3,000 Hardware/Software* 10,350 Data Entry Personnel 44,500 Data-base Maintenance 8,000 Output 5,000 Total Annual Cost $74,350 *Total cost distributed over a twenty-year period. 11 A formula for estimating time and personnel needed to accomplish a comparable job would be: number of lines per average entry (a) times number of characters to be keyed in each line (b) times number of entries (c) divided by the constant 750,000 equals the number of man-years required (d) to produce a finished product, ( a x b x c v 750,000 = d). The constant 750,000 was derived by dividing the three million characters of the six thousand series descriptions by four input operators. 12 Weiher, vol. 4, chapter 5.
The NARS A-l Experience 177 Figure 7, Estimated Costs TOTAL ANNUAL System Study $ 70,000 $ 3,500 Program development 103,000 5,150 Data Entry 868,000 43,400 Data base maintenance 8c output 127,725 6,386 $1,168,725 $58,436 (or $9.75 per series) There was no allowance in the pro- puter assistance, the overall cost of the jection for data-entry hardware. There system is high. If a fully automated sys- was mention of an unspecified amount tem with on-line retrieval by index for "other costs," which covered the terms had been implemented, the cost projected cost of some sort of type- would have been excessively high, and writer to produce machine-readable the production rate so slow that it data. The cost of data-entry equip- would have taken sixty years to catch ment left out of the study may acount up. The A-l system is a fair warning, for the difference of $2.65 per series therefore, that automation of archival between the study and the actual im- finding aids must be approached care- plementation cost of the A-l system. fully. The size of the data-base and the cost per keystroke must be calculated in advance to see if there is enough Conclusion time, enough people, and enough Though the automated part of the money to convert the finding aid in- NARS A-l system is limited to com- formation into machine-readable form.
You can also read