Search the Catholic Portal

How to Make MARC and EAD Files Available

Introduction

At its core, the “Portal” is an index — a list of pointers to content items. Access to this index is implemented through a form-based interface. Readers enter queries into the form, and items are returned. Readers are then expected to select items of interest from the returned list, and use them for the purposes of research and scholarship. In order to implement this functionality, each content item in the index requires, at the very least, three elements: 

  1. a unique identifier,
  2. a human-readable description of the item, and
  3. a location code where the item can be acquired.

The MARC and EAD metadata schemes are well-suited for indexing. After making sets of MARC records and/or EAD files transparently accessible on a Web server, it is easy to harvest the metadata, integrate it into the Portal’s index, and provide access to the content items.

The balance of this document describes how to make MARC and EAD files available for harvesting.

1.2.1. MARC

Here’s the short version. Export all the MARC records from your integrated library system apropos to the “Catholic Portal” making sure they are encoded using the UTF-8 character set. Save the resulting file on the CRRA file server.

Here’s the long version. Remember, every record in the Portal needs a unique identifier, a human-readable description, and a location code. For MARC records, this means every record needs:

  1. A value in the 001 field. Any value will do as long as it is unique to your set of records.
  2. Each MARC record needs something in the 245 field. At the very least this will be the human-readable description. All the other descriptive and analytic fields will supplement this description.
  3. Each MARC record needs to have a location code, and this is the item’s call number. This value will most likely be extracted from the 090 field.

Once you have identified which MARC records to extract from your integrated library system, it is recommended you denote which items are to be extracted by updating them with a local note. The University of Notre Dame adds the letters CRRA in MARC 590 subfield a. Once this is done it is relatively easy for the systems librarian to do a search for CRRA in the 590 subfield a, and download the resulting records to a file. Alternatively, the systems librarian might search for all items whose call numbers begin with BX and download the resulting set. The process you use to denote and export your MARC records depends on your local environment.

When exporting your MARC records from your integrated library system, it is imperative the records be encoded using the UTF-8 character set and not something else. The Portal’s underlying indexer does not deal very well with encodings of another kind. If your system does not export records as UTF-8, and it exports things in MARC-8 instead, then use an open source application called yaz-marcdump from Index Data to transform your records from one encoding into another. Once yaz-marcdump is installed you can execute a command like the following to do the transformation:

yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 input.mrc > output.mrc

The command translates MARC records from (-f) MARC-8 encoding to (-t) UTF-8 encoding. It outputs (-o) the result as MARC records, and inserts the letter a (ASCII character 97) into the leader (-l) at position 9. It uses the file named input.mrc as input, and it outputs the result to a file named output.mrc.

Every time you export your records, you should export everything that you feel is relevant to the portal. Do not worry about additions, changes, nor deletions.

After the records have been exported, transfer the data to your MARC folder on the CRRA file server (crra.andornot.com) using the Secure File Transfer Protocol. We recommend that you use Filezilla to upload the data. Delete your obsolete files on the SFTP server and upload the new ones. Ask Steve Lapommeray, the Portal Administrator, if you need help ([email protected]).

1.2.2. EAD

Here’s the short version. Use validated EAD files to encode the content you deem apropos to the Portal. Save all the EAD files in your ead folder on the CRRA file server making sure each file is given a .xml extension.

Here’s the longer version. Use whatever tool you desire to create EAD files describing the archival content you deem appropriate for the Portal. There are any number of available editors and applications facilitating this process. Make sure the resulting EAD files validate against the EAD DTD or schema. It doesn’t really matter which one, but right now validation against the DTD is easier to handle here at Portal Central.

Each did-level element in your EAD files will eventually become a record in the Portal’s index. During pre-processing here at Portal Central, unique <unitid> attributes will be added to each <did>-level element, if no <unitid> attributes exist in the first place. This pre-processing satisfies the need for unique identifiers. You need to do nothing in regards to unique identifiers.

Each <did>-level unittitle element will recursively be combined with its parent<did/unittitle> element to form a human-readable description of each content item. Consequently, there is nothing you need to do in regards to human-readable descriptions.

The location of items found in EAD files is facilitated in three ways. First, the name of your hosting institution and library/archive will be associated with each search result, thus the need for location information will be satisfied but only in a rudimentary way. Second, through the use of the url attribute of the <eadid> element, location information is re-enforced. Specifically, you are expected to include a value in the url attribute of the <eadid> element. This value is expected to point to a human-readable version of your EAD file on your Web server. Portal search results include hot links with a label similar to “View finding aid at owning institution.” The hot links will be the same as the value in the url attribute. Your human-readable version of the EAD file is then expected to include instructions and contact information describing how to acquire items of interest. Finally, search results will include a second hot link labeled similar to “View finding aid in Portal display”. These hot links will equal to a URL pointing to a local HTML file transformed from the original EAD. Again, location and contact information should be a part of the HTML because it was a part of the original EAD.

In summary, create complete and valid EAD files making sure you include values in the url attributes of the <eadid> elements.

Once you have created your EAD files, transfer them to your ead folder on the CRRA file server (crra.andornot.com) using the Secure File Transfer Protocol. We recommend that you use Filezilla to upload the data. Ask Steve Lapommeray, the Portal Administrator, if you need help ([email protected]).

CONTACT INFORMATION

If you have questions along the way, don’t hesitate to contact Kevin or Steve:
Kevin Cawley, Digital Access Committee Chair, [email protected]
Steve Lapommeray, Portal Administrator, [email protected]