Locus Codex Mac OS
The Solus Project is a single player exploration adventure with survival elements. The adventure is set on a mysterious planet and is the spiritual successor to The Ball.
This file provides technical information about accessing digitalimages from the OPenn website, and about the conventions and standardsused in creating the data.
- Mac OS X: How to verify a SHA-1 digest (Mac) sha1sum(1) - Linux man page (Linux) Comparison of file verification software (Wikipedia) For more information see the SHA-1 Wikipedia page. Package version. It should be a rare occurrence, but from time-to-time packages will need to be updated.
- Academia.edu is a platform for academics to share research papers.
- Download CODEC for Mac to 4D component providing MD5 digest encoding.
Licenses and use
All images and metadata are released under licenses that CreativeCommons has approved for Free Cultural Works, bearing:
- the CC Public Domain mark
- CC0 ('CC-zero'), the Public Domain dedication for copyrighted works
- CC-BY, the Creative Commons Attribution license
- CC-BY-SA, the Creative Commons Attribution-Share Alike license
You are free to download and use the images and metadata on thiswebsite under the license assigned to each document. You do not needto apply to the holding institutions prior to using the images. We doask that whenever possible you cite this website and the holdinginstitution when you use any of these resources.
On this website, you will find material from several institutionalcollections. In order to determine the license under images have beenreleased, please refer to each repository's web page on OPenn.
Accessing the data
Data on this site can be accessed in a number of ways, via the HTTPweb site, anonymous FTP, and the RSYNC remote synchronizationutility. Each of these is discussed below.
Users who want to do more than casual browsing using the site’s HTMLpages should understand its directory structure. The site'sorganization is:
Within each document directory, document images and metadata arepresented in a structured package, which is described below.
HTTP Access
Individual manuscript images can be viewed and downloaded from thissite using a Web browser. Site navigation guides are in the How to usethis data set section of the ReadMe file.
There are useful tools that will allow you to perform bulk downloadsof whole documents, select document images, and entire sections fromOPenn over HTTP. One of these is wget, which can be run on Mac OS,Windows, and Linux computers. Instructions for installing and usingwget are provided below in the section 'Appendix: Downloading fileswith wget'.
Anonymous FTP
FTP is a convenient method for doing bulk download of files and wholedirectories of files. OPenn is accessible via anonymous FTP atopenn.library.upenn.edu
:
Note that no password is needed.
Free graphical FTP clients are available for all major commercial andfree operating systems. For configuration of FTP client software, usethe standard FTP network port, 21.
Anonymous RSYNC
RSYNC is an application for synchronizing files between computersystems and is probably the best tool to use for bulk retrieval ofdata from OPenn.
All data on OPenn is accessible via anonymous rsync. From the commandline on Unix systems the following command can be used to list OPennfiles.
See the section 'Appendix: Downloading files with rsync' below formore information on using rsync.
File naming conventions
Image files have names like:
Each image has a base name consisting of document identifier (e.g.,0284
), underscore, and a serial number (e.g., 0003
). Each of thefiles that share a base name is a different version of the same image.Serial numbers are in a natural order, such as book page order. Forexample, if an entire book has been imaged including cover, then thefirst serial number (0000
) is assigned to the outside front cover,the second serial number (0001
) to the inside front cover, and so on.
Note that the parts of a document that are imaged and their order willdepend on the providing institution's practice and policies. Theorder and description of each image will be given in each document'sTEI description's <facsimile>
. See below for more information ondocument descriptions.
The rest of the file name indicates the derivative and file type ofthe image. Images are either TIFF .tif
or JPEG .jpg
. There arethree derivative types. They are:
- a full-sized master image, typically a TIFF;
- a web JPEG image that is 1800 pixels on its longest side; and
- a thumbnail JPEG that is 190 pixels on its longest side.
The file names indicate the derivative type through a tag, which isthe last segment of the file name before the extension .tif or.jpg. The tag is web
for the WEB JPEG, and thumb
for the thumbnailJPEG. The master image has no tag.
The following file names are for the master, web and thumbnail imagesfor LJS 16, image serial number 0284
:
XMP sidecar files
Each image is accompanied by an XMP 'sidecar' file that contains theimage's metadata. Each sidecar file has the name of the image with anadditional .xmp
extension:
See below for more information on the XMP metadata.
Finding the file you want
Image subject names are made available in two ways: through ahuman-readable browse page and through a TEI manuscript description.
Each document's browse page lists the images in order with contentnames ('folio 1a', 'front flyleaf 1a', etc.) and associated filenames, as can be seen here:
Second, each TEI manuscript description lists all images in order inthe TEI file's <facsimile>
section. Note this fragment fromljs168_TEI.xml:
TEI manuscript description is described in greater detail below.
Manuscript packaging & preservation metadata
Each object's images and metadata are presented in a regular packagestructure that allows for automated navigation of the package and itscontents.
The directories have this structure:
This diagram shows part of a typical package with files:
The package is divided into the top-level directory (in this caseljs319
), which contains package metadata, and the data itself, foundhere in the directory ljs319/data
. The data
directory containsthe manuscript description and the image files and theirmetadata. Each of these is described below.
Core and 'extra' images
Core document images are in the package's data/master
, data/thumb
,and data/web
directories. All of these images are listed in the<facsimile>
section of the TEI manuscript description. Any otherfiles provided with the document, like color and ruler referenceshots, are included in the data/extra
directory in master
,thumb
, and web
sub-directories.
Package metadata
The top-level directory contains the data
directory and the packagemetadata.
There are two package metadata files: manifest-sha1.txt
andversion.txt
. The first lists each file in the data directory withits SHA-1 checksum. The second provides information for the packageversion.
See below under 'Preservation and technical metadata' for more onthe manifest and version files.
Preservation and technical metadata
Package contents and integrity
The top-level directory of each package contains a manifest-sha1.txt
file that lists each file in the package's data directory with itsSHA-1 checksum.
The format of the manifest-sha1.txt
follows the format of the outputof the GNU sha1sum
program:
Checksums can be used to confirm a file's integrity; that is, that ithas not changed since it was last modified.
On Mac OS, Linux, and other Unix-like operating systems verificationcan be done using sha1sum
or a similar command-line utility.
Running sha1sum
on a file will print its checksum and name:
This checksum value can be used to confirm the file has remainedunchanged. Note that the checksum printed for data/ljs319_TEI.xml
by sha1sum
is identical to the one listed in the above excerpt fromthe manifest-sha1.txt
file.
Sha1sum
can also be used with the -c
flag to check an entiremanifest:
There are checksum verification programs for all modern operatingsystems. Each behaves differently. Familiarize yourself with the oneyou choose. Here are some examples:
- Microsoft File Checksum Integrity Verifier (Windows)
- Mac OS X: How to verify a SHA-1 digest (Mac)
- sha1sum(1) - Linux man page (Linux)
- Comparison of file verification software (Wikipedia)
For more information see the SHA-1 Wikipedia page.
Package version
It should be a rare occurrence, but from time-to-time packages willneed to be updated. OPenn does not yet have a full system formanaging package versions; however, in anticipation of that systemeach package is provided with a version.txt
file in its top-leveldirectory:
The following is the version.txt
file for LJS 319.
The file contains one or more dash-separated stanzas for each versionof a package. The top stanza describes the most recent version of thepackage. The structure of each stanza is:
version
: three-part semantic version number; e.g.,1.0.0
,1.0.1
, or1.1.0
.date
: timestamp of this version's creationid
: database identifier of this versiondocument
: database identifier of the package documentdescription
: the reason for this version
Semantic versioning
OPenn uses semantic versions with a three-component version number:
Example:
New versions of a package contain alterations of data and metadatacontent. Version number changes indicate the type of change andwhether a new version will likely be compatible with applicationsbuilt on previous versions of the package.
All OPenn packages are machine readable and follow a regular pattern.Any application that loads OPenn data dynamically should have noproblem with changing package contents; however, applications thatcache part of the data may fail to work with a new version of apackage that, for example, has fewer images or removed metadata.
A change to the last digit (e.g., 1.0.0
to 1.0.1
) indicates apatch or correction that does not add or remove data or metadata.The package remains compatible with applications built on the previousversion of the package. An example of a patch change would be aspelling correction in metadata.
A minor version change (e.g., 1.0.0
to 1.1.0
), indicates theaddition of new data or metadata. The package will be work withapplications built on the previous version. An example of a minorchange would be the addition of new metadata to the document'smanuscript description or the addition of new images to the dataset. While the new version will work as before, it may be desirable toupdate software to take advantage of new data.
A major version change (e.g., 1.1.0
to 2.0.0
) indicates theremoval of data or metadata or other substantive change that willlikely cause this version to not work with software built on aprevious version of the package.
Descriptive and structural metadata
A TEI file like ljs319_TEI.xml
provides descriptive and structuralmetadata for each document. The file is stored and named as follows:
Example:
The TEI file name always contains the name of the top-level packagedirectory.
See the section TEI manuscript description below for a detaileddescription of file.
XMP
Each image file has key metadata stored in its header. Thisinformation is also included in a .xmp
sidecar file for each image:
The XMP file includes Dublin Core and technical metadata and rightsinformation. What follows is the content of a sample XMP file.
Notable XMP elements
Dublin Core elements:
creator
-- person or organization responsible for creating theimage- example: 'The University of Pennsylvania Libraries'
date
-- date of the creation of this version of the image,including metadata- example: '2015-03-24'
description
-- brief description of the image content- example: 'This is an image of fol. 1r from University ofPennsylvania LJS 319: Derrota, from Manila, Philippines, datedto approximately 1750.'
format
-- MIME type of the image, eitherimage/tiff
orimage/jpeg
identifier
-- unique identifier for the master image and itsderivatives- example: '311.64390'
publisher
-- person or organization responsible for publication ofthe image- example: 'The University of Pennsylvania Libraries'
relation
-- a related resource- example: 'University of Pennsylvania LJS 319'
rights
-- access rights- example: 'This image and its content are free of known copyrightrestrictions and in the public domain. See the Creative CommonsPublic Domain Mark page for usage details,http://creativecommons.org/publicdomain/mark/1.0/.'
subject
-- a list of subjects- examples: 'Navigation--Early works to 1800', 'Pilotguides--Philippines'
title
-- the title of the image- example: 'University of Pennsylvania LJS 319: Derrota, fol. 1r'
type
-- the resource type, always 'image'
Photoshop element:
Source
-- the source of the image content- example: 'University of Pennsylvania LJS 319, fol. 1r'
xmpRight elements
Marked
-- whether this is a rights-managed resource; 'False' ifPublic Domain, 'True' otherwiseUsageTerms
-- a description of the terms of usage for thisresource- example: 'This image and its content are free of known copyrightrestrictions and in the public domain. See the Creative CommonsPublic Domain Mark page for usage details,http://creativecommons.org/publicdomain/mark/1.0/.'
TEI document description
Each document package includes a TEI file that provides a manuscriptdescription and structural metadata that maps images to the pages ofthe document. TEI files comply with the TEI P5 Guidelines.
The following TEI tags are employed:
The description title
The TEI titleStmt
contains the description title.
Element:
Example:
Publication information
The TEI publicationStmt
contains the publisher and licensinginformation.
Elements:
Example:
General notes
The TEI notesStmt
contains general notes about the document.
Element:
Example:
Document identification
The TEI msIdentifier
contains identification information. Eachdocument is primarily identified by its repository and call number.
Elements:
Example:
Document abstract and summary
The TEI summary
element contains a long form description of thedocument.
Element:
Locus Codex Mac Os Catalina
Example:
Locus Codex Mac Os X
Language information
The TEI textLang
element contains information about the document'slanguages.
Element:
Example:
Content information
The description's first TEI msContents/msItem
element containsdetailed description of the contents of the document as a whole. Thisinformation includes the document title, authors, other contributors(scribe, artist, etc.), and colophon.
Elements:
Example:
Subdivision content information
TEI msItem
elements after the first msItem
contain section andchapter titles. These elements can be distinguished from the generaldocument-level msItem
by the presence of the @n
attribute andchild locus
element.
Elements:
Example:
The msItem/@n
attribute corresponds to the facsimile/surface
element with the same @n
attribute.
Document support description
The TEI supportDesc
element contains information about thedocument's support, including support material, collation information,extent, foliation (or pagination), and watermark.
Elements:
Example:
Layout information
The TEI layoutDesc
contains a description of the document's layout.
Element:
Example:
Script and palaeographic information
The TEI scriptNote
element contains a description of the document'sscript.
Element:
Example:
Decorations
Elements:
The TEI decoDesc
element contains descriptions of decorative andfigurative features of the document. A decoNote
without an @n
attribute provides a general description of decorative features. AdecoNote
with an @n
attribute corresponds to the facsimile/surface
element with the same @n
attribute.
Element:
Example:
Binding
The TEI bindingDesc
element contains a description of the document'sbinding.
Element:
Example:
Document history
The TEI history
element contains information about the document'shistory including its date and place of origin and provenance history.
Elements:
Example:
Keywords and genre
TEI keywords
elements contain genre and subject information aboutthe document.
Elements:
Example:
Structural metadata
The TEI facsimile
element lists the imaged parts of the document, inorder, with their names, linked to the document's images. Thesurface/@n
attribute contains the part's name or page/folio number.
Elements:
Example:
Standards
OPenn data and metadata adhere to international standards. Thefollowing is a list of the most important of those.
Dublin Core: each image includes descriptive Dublin Core metadatabased on the Dublin Core Metadata Elements (DCME);for more information on DCME, see the Dublin Core site
TEI P5: manuscript description information is encoded according toText Encoding Initiative (TEI) P5 guidelines
TIFF: when available TIFF images are used for master images; theTIFF specification is available as PDF from the Adobewebsite
Unicode: text information in XML files and other text documents isin Unicode, typically with UTF-8 encoding; visitUnicode.org for information on the Unicode standard
XMP: Extensible Metadata Platform; all images haveXMP-encoded metadata in their headers and are accompanied by XMPsidecar files
Appendix: Downloading files with wget
This section provides instructions for using wget to download filesfrom OPenn. Wget is a command-line utility available for Linux, MacOS, and Windows.
Installing wget
First, you’ll need to install wget on your computer.
Mac OS
On a Mac you can install wget directly --Install and configure wget on OS X -- or if you already havethe Homebrew package installer you can use it.
Windows
Download the appropriate setup*.exe file fromhttp://cygwin.com/install.html. Double-click setup*.exe
and choose'Install from Internet'. Follow the prompts until you are asked tochoose a download site for cygwin. Choose any site and continue.Follow the prompts again, until you get to the 'Select Packages' page.Click the + next to Web (you may need to scroll down), then clickdirectly on 'Skip' and select the first box next to 'wget: Utility toretrieve files from the WWW via HTTP and FTP'. Click next, accept anydependencies. Download and installation may take a few minutes.
Navigating the command line
Cygwin will install its own folders. Wget will download files intothese folders, and you can move the files later.
On a Mac, open your Terminal program. It will probably open in yourDocuments directory. On Windows, open the Cygwin terminal.
Your command prompt will look something like this, ending with a $
:
To move into a different directory, use the cd command:
Your command prompt will reflect your new location:
To see all the files and folders available to you, use the ls command:
To create a new folder, use the mkdir command:
More information about these commands and others can be found on thisOS X command line cheat sheet.
Now on to wget.
Using wget
The basic wget
command will download a single file into the directoryyou are in. So
will download the index.html page at that address. However, this isprobably not what you want. You want to download image and metadatafiles, either for the entire repository or for specific manuscripts.There are a number of different commands that will allow you to controlwhat exactly gets downloaded, and where those files are placed on yourcomputer.
wget Recipes
Download a single file
I want to download a single image for a specific manuscript:
This will bring down only that image that you specify. You can use thesame command to download the XML manuscript description:
Download multiple files
You can also use wget to bulk-download files.
I want to download all of the LJS Manuscript data, including master,thumbnail, and web images, and XML manuscript descriptions, in thedirectory structure used on the OPenn site:
wget
= use the wget program-np
= 'no parent', this means do not download any files that arein the folders containing the 0001 folder-r
= 'recursive', this means download files directly in the0001 folder, and also download any files that arein folders inside that folder (without this command, you would onlyget those files directly inside the 0001 folder)http://openn.library.upenn.edu/Data/0001/
=start download from this location
I want to download only the XML manuscript descriptions and jpeg files(thumbnails and web images) for a single manuscript. All files aresaved in a folder named ljs225
wget
= use the wget program-nd
= 'no directory', this means do not use the directorystructure from OPenn, put all the files into a folder specified byme-np
= 'no parent', see above-r
= 'recursive', see above-A.jpg
= 'accept list', accept only .jpg files-A.xml
= 'accept list', accept only .xml files-P openn/ljs225
= 'directory prefix', the folder to which all the files will be downloadedhttp://openn.library.upenn.edu/Data/0001/ljs225/
= start download from this location
I want to download all the web JPEGs for all the manuscripts inOPenn to a folder called data/web
.
wget
= use the wget program-nd
= 'no directory', see above-np
= 'no parent', see above-r
= 'recursive', see above-A.xml
= 'accept list', accept only .xml files-P openn/msDesc
= 'directory prefix', see abovehttp://openn.library.upenn.edu/Data/
= start download from thislocation
You can combine the different commands to specify exactly what youwant to download.
Appendix: Downloading files with rsync
Rsync is a command-line Remote SYNChronization designed to maintainduplicate copies of data on remote machines. It is also a verypowerful tool for the bulk downloading of files. The instructionsbelow show how to install rsync and use it to download files fromOPenn.
One advantage rsync has over other tools is that it does, by default,synchronize two directories, usually one a remote server and one on alocal computer. This means that rsync can be run multiple times onthe same two directories and it will only copy new and changed filesfrom the source to the destination. It can also be set up not just tocopy new and changed files, but also to remove files from thedestination that are no longer on the target, and thus keep two filesystems truly synchronized.
The Linux manual page for rsync is here:http://linux.die.net/man/1/rsync. Note that rsync is different foreach operating sytem. For complete rsync documentation for yoursystem view the rsync man page (man rsync
).
Rsync commands can be quite complex and tricky to get working justright. There are ample resources on the web for answering particularrsync questions. The samples below show basic usage of rsync forcopying data.
Installing rsync
First, you’ll need to install wget on your computer.
Mac OS & Linux
Mac OS ships with rsync installed.
If your Linux system does not have rsync installed, you can installwith your package management software.
Windows
Locus Codex Mac Os 11
Download the appropriate setup*.exe file fromhttp://cygwin.com/install.html. Double-click setup*.exe
and choose'Install from Internet'. Follow the prompts until you are asked tochoose a download site for cygwin. Choose any site and continue.Follow the prompts again, until you get to the 'Select Packages' page.Click the + next to Net (you may need to scroll down), then clickdirectly on 'Skip' and select the first box next to 'rysnc'. Clicknext, accept any dependencies. Download and installation may take afew minutes.
Navigating the command line
Cygwin will install its own folders. Wget will download files intothese folders, and you can move the files later.
On a Mac, open your Terminal program. It will probably open in yourDocuments directory. On Windows, open the Cygwin terminal.
Your command prompt will look something like this, ending with a $
:
To move into a different directory, use the cd command:
Your command prompt will reflect your new location:
To see all the files and folders available to you, use the ls command:
Locus Codex Mac Os Download
To create a new folder, use the mkdir command:
More information about these commands and others can be found on thisOS X command line cheat sheet.
Now on to rsync.
Using rsync
The basic rsync command, when issued on a site providing anonymousrsync like OPenn will list a directory's contents:
Adding a subfolder to the above command will give a list of items inthat folder:
Note the trailing /
character after Data
and0001
.
Downloading an entire document
You can pull down an entire document from OPenn by entering the pathto its directory. This command will download all of LJS 49 to the usertom's Manuscripts directory:
Note that the final character on the first and second lines is used to break up the long line. If entered on the command line the
must be the last character on the line and cannot be followed by spaces.
That command will silently retrieve all of LJS 49. To get more detailedinformation about what is happening, you could use a command like thefollowing:
Be aware that the data set is quite large, and the images for a singlemanuscripts can be over 100 GB.
Download select document images
You can pull down a specific set of images for a document (masterTIFFs, or web or thumbnail JPEGs) by specifying the image folder. Thiscommand will retrieve all web JPEGs for manuscript LJS 49:
Rsync also offers the ability to select source files by regularexpression, so that very precise selection files for download can bemade based on patterns of filenames.
Mirroring all of OPenn
You can use rsync to mirror OPenn. Here is a command that will do asimple copy of all of OPenn to another file system:
Here the --delete
option will delete any files /var/www/html
notfound on OPenn. This command can be run regularly to keep anup-to-date local copy of OPenn on your system. In production, youwould want to fine tune this command to your situation.
As noted above, rsync is extremely powerful and flexible. Experimentwith rsync and look among the many resources on rsync on the Web tolearn more about rsync and using it for your needs.