Search the Catholic Portal

Filtered by category: Tech Issues/Tips Clear Filter

Portal surgery

I was recently told to delete thousands upon thousands of records from the "Catholic Portal", and through the magic of the Solr's Web-based API and a full-featured HTTP client I was able to do this surgery with laser beam accuracy.

Specifically, I needed to delete all of the records in the Portal from the University of Notre Dame Archives because the Archives wanted to totally replace what finding aids were available. This meant deleting more than a 100,000 records from the underlying index. After a bit of investigation, I learned that at the following one-liner from the command line would do the trick:

Read More

Indexing EAD files in the "Catholic Portal" with VUFind

This posting describes how EAD files are indexed in the "Catholic Portal" with VUFind.

VUFind is a "next-generation library catalog" or "discovery system" application. Its primary purpose is to index bibliographic metadata and provide a reader-friendly interface to the result. The heart of this process is a Solr index made up of many bibliographic-like fields. These fields are the usual suspects including a host of variants on author, title, institution, building, collection, language, format, physical description, publisher, published date, edition, description (note), contents, URL, call number, ISSN, ISBN, OCLC number, series, topic, genre, geographic, era, illustration, full text, and record type. In order for EAD files to be searchable in the Portal, they need to have their metadata extracted, the metadata needs to be mapped to Solr fields, and the metadata needs to be added to the index. The balance of this posting describes this in more detail.

Read More

VUFind and sitemaps

In an effort to improve SEO (search engine optimization) I have done my best to implement sitemaps against the "Catholic Portal's" VUFind implementation.

Sitemaps are XML files listing all the individual files/resources of a website. The intention and structure of these files is documented at Sitemaps.org. By exposing a site's content in this way Internet robots/spiders can slurp up sitemap files' URLs, go directly the resources without crawling, and index the content found there. In short, sitemaps make it easier for Internet indexers to do their job.

Read More

VUFind, version 1.1 or so

I believe I have just finished upgrading the production version of VUFind -- the software driving the "Catholic Portal" -- to version RC3107 which is somewhere between version 1.1 and 1.2. This upgrade addresses at least a couple of usability issues, specifically:

  1. wording in regards the linking of online finding aids
  2. toggling the check box associated with filters

With this version there are also quite a number of additional records in the underlying index -- around 280,000. This is because the finding aids (EAD files) have been indexed more completely.

Read More

How to upgrade VUFind

These are notes (to myself, mostly) on how to upgrade VUFind from a local "sandbox" version to a production version. But they are also documented here, just in case I win the lottery and start enjoying umbrella drinks on some Caribbean island.

Read More

PastPerfect

This posting outlines the possibilities for ingesting PastPerfect content into the "Catholic Portal".

As membership in the Catholic Research Resources Alliance (CRRA) grows, so does the number of metadata formats the "Catholic Portal" is expected to support. When the CRRA was just beginning MARC was the predominate metadata format. After the content of university archives was recognized as significant, EAD became very important. Some institutions use neither MARC nor EAD to describe their special collections but instead use systems like ContentDM. These sorts of things are often accessible via OAI-PMH, and thus, at the very least, harvestable Dublin Core is available. In order to support discovery, all of these types of metadata need to be parsed, mapped to VuFind's underlying Solr schema, and indexed.

Read More

Harvesting metadata

It is imperative for CRRA member institutions to make their metadata available for harvesting via a Web server.

A couple of years ago, when the "Portal" was just beginning, the modus operandi for ingesting MARC and EAD metadata was to send it to Notre Dame, save it on local hard disk, and index it. That process worked then, but as we grow it becomes less and less scalable.

Read More

"Catholic Portal" usability efforts

This page has become the home page for the usability efforts of the "Catholic Portal".

The Digital Access Committee had a conference call on Thursday, May 12. The purpose of the meeting was to discuss usability studies. The resources (time and money) required to do the studies was emphasized. Similarly, the need to have the studies done with the intended audience of the Portal -- upper-class man, graduate students, faculty, and scholars -- was also stressed.

Read More

CRRA-Tech

This is the home page for a mailing list called CRRA-Tech.

The Catholic Research Resources Alliance (CRRA) or "Catholic Portal" brings together data and metadata for the purposes of Catholic research and scholarship. This process is facilitated through a number of groups dealing with administrtive issues, collection issues, metadata issues, etc. CRRA-Tech is a mailing list intended to support and discuss the computer technology issues of the CRRA such as but not limited to the harvesting of content and metadata, the validation of content and metadata, indexing technologies, library "discovery systems", the programming languages (PHP, Java, Perl, and Javascript) used, log file analysis, casscading stylesheets, debugging tools, the role of open source software, etc. In short, CRRA-Tech provides a forum for discussing the computer infrastructure of the Portal.

Read More

Doing usability against the "Catholic Portal"

This posting describes a process for iteratively studying usability issues against the "Catholic Portal" with the expectation that it will be applied by each institutional member of the Digital Access Committee within the current calendar year. The posting is divided into the following sections:

This document is also available as a PDF document for printing, a second PDF document designed as a set of slides, and just for fun, an EPUB file for your mobile device.

Read More

Usability testing

As we move the "Portal's" sandbox implementation into production we plan on doing some usability testing. Below are the question we will be asking:

  1. Identify the library or archive holding the papers of Dorothy Day.
  2. Find a record whose author is Graham Greene. Create an account, then add the Graham Greene record to your favorites, tagging it as "ggreene."
  3. Locate resources, including primary resources, on the Catholic Conference for Interracial Justice.
  4. Find a set of records on the topic of "Catholic social action." Choose 1-3 from the retrieved set and email them to yourself for future reference.
  5. Locate materials on the topic of sermons and the Lutheran church.
  6. Who owns "Our Sunday Visitor Records"? What telephone number would you call in order to schedule a time to visit the collection?
  7. Which library has the most French-language materials in the "Portal"?
  8. What is the most frequently used word in the pamphlet owned by Notre Dame entitled "Pastoral instruction for the application of the Decree of the Second Vatican Ecumenical Council on the Means of Social Communication"? (hint: see the record with the call number BV 4319).
  9. How would you describe the overall scope of the collection?

Wish us luck.

Data warehousing Web server log files

I have begun to create a data warehouse for CRRA (VuFind) Web server log files. This posting introduces the topic.

The problem

There is an understandable need/desire to know how well the "Catholic Portal" is operating. But for the life of me I was not able to enumerate metrics defining success. On the other hand, Pat Lawton had no problem listing quite a few. Here are most of her suggestions:

Read More

VuFind, OAI-PMH, and the "Catholic Portal"

Without undue difficulty I have been able to harvest metadata from a ContentDM site via OAI-PMH, index the data in Solr, and successfully search & retrieve this metadata in VuFind all for the "Catholic Portal". This posting outlines how I did this and why it is important.

Background

Read More

Simple log file analysis

Today I did a bit of simple log file analysis against the Portal's Apache log file. Specifically, I wanted to extract the queries people have been using.

Naturally, I wrote a program to do this work -- parse.pl. It is rather brain-dead and certainly not 100 percent accurate, but it goes generate a report of some value.

Read More

Catholic pamphlets and the "Catholic Portal"

This posting outlines a possible workflow for getting digitized versions of Notre Dame's Catholic pamphlets into the "Catholic Portal".

The problem

The University of Notre Dame owns a significant number of Catholic pamphlets. These materials have been cataloged and denoted as destined for the "Portal" in their MARC records with the letters "CRRA" in field 590$u.

Read More

VUFind record drivers and templates

This posting documents how I wrote and edited a couple of VUFind record drivers and Smarty templates for the "Portal" of the Catholic Research Resources Alliance. In writing this posting I hope to support any developer coming behind me as well as inform the wider open source community on how VUFind works.

The Problem

Read More

Text mining Catholic pamphlets

This is the quickest of blog postings outlining how I am initially providing a text mining interface to digitized Catholic pamphlets.

Jean McManus used a scanner to create PDF versions of a few Catholic pamphlets. Along the way, she also had the software to a bit of OCR. She then gave the PDF documents to me with filenames matching MARC 001 fields.

Read More

Internet Archive content, VUFind (Solr), and text mining

The posting outlines how I have: 1) mirrored metadata and full text content from the Internet Archive, 2) made the mirrored content accessible through VUFind, and 3) implemented a rudimentary text mining interface against the mirror.

Background

The "Catholic Portal" is intended to be a research tool centered around "rare, unique, and uncommon" materials of a Catholic nature. Many of these sorts of things are older as opposed to newer, and therefore, many of these things are out of copyright. Projects such as Google Books and the Open Content Alliance specialize in the mass digitization of out of copyright materials. By extension we can hope some of the things apropos to the Portal have been digitized by one or more of these projects.

Read More

Names & addresses

This posting outlines how the names & addresses of the "Catholic Portal" are made available. The purpose of this posting is mostly documentation. Documentation for myself, since I always forget. And documentation so somebody else can do the work after I win the lottery and move to the beach to drink cocktails with umbrellas in them.

Here goes:

Read More

Indexing MARC and EAD in VUFind with Solr for the CRRA

This posting outlines how I am currently indexing MARC and EAD files in VUFind with Solr for the CRRA. (Boy, there are a lot of acronyms in that sentence!)

Background

The Catholic Research Resources Alliance (CRRA) is a member-driven organization with the purpose of making available "rare, unique, and uncommon" research materials for Catholic scholarship. Presently the membership is primarily made up of libraries and archives who pool together their metadata records, have them indexed, and provide access to the index. My responsibility is to build and maintain the technical infrastructure supporting this endeavor.

Read More