Name | Last modified | Size | Description | |
---|---|---|---|---|
Parent Directory | - | |||
DataImport.xhtml | 2007-03-12 09:01 | 9.4K | ||
csvToTk | 2007-03-12 08:38 | 14K | ||
toydb/ | 2010-09-20 03:42 | - | ||
csvToTk is a perl script that converts CSV files to ingest files suitable for import by the Archivist Toolkit.
run csvToTk -h to see the options available
As a courtesy it trims leading and trailing spaces from accession and name properties.
csvToTk the script it requires the DBI module, if it is not installed on your machine get it from cpan toydb a toy database example with 2 accessions and 4 names toydb/Accessions the Accessions table in csv formats toydb/Names the Name table in csv formats toydb/Links the Links table in csv formats connecting accessions with names
At Georgia Tech we do not use all possible accession/name properties available in the toolkit. To add or remove property fields that are or are not used at your institution you need to edit the script slightly; that is you need to adjust the field name lists it uses, see FIELD_NAMES
> csvToTk -d toydb -A Accesions.txt -N Names.txt +++ Accessions: File 'Accesions.txt' 2 OK 0 FAILED Name record on line 6 in toydb/Names is kaput, has bad name type: (Name_contactAddress1) => (Elmer Lane) (Name_contactPhone) => (444-222-0101) (Name_contactCountry) => (USA) (Name_contactMailCode) => (0001) (Name_corporatePrimaryName) => (Institute For Sticky Bits) (Name_contactCity) => (Stuckton) (Name_nameType) => (person and corporate) (Name_personalPrimaryName) => (Glue) (Name_contactEmail) => (glitter@glue.com) (Name_personalRestOfName) => (Glitter) +++ Names: File 'Names.txt' 4 OK 1 FAILEDThis produces the Accessions.txt and Names.txt files. csvToTk reports an error for record on line 6 in toydb/ Names because it contains person and corporate information. This record does not become part of the ingest file.
2 accessions are created as a result of ingesting Accessions.txt. 4 names and 4 more accessions are created as a result of ingesting Names.txt. The last 4 accessions have the numbers 9999.000.2/3/4/5. The 2, 3, 4, and 5 correspond to the lines of the dbToys/Names table in which the name for this accession was defined. Creating fake accessions for names is an ugly work around necessary since the toolkit requires that names are ingested only in connection with accessions.
> csvToTk -d toydb -t toolkit_test -L Links.sql Can't find Accession (2006.1214.3) in toolkit database (Name_contactAddress1) => (Elmer Lane) (tk_nameFunction) => (Creator) (Name_contactCountry) => (USA) (Name_contactPhone) => (444-222-0101) (Name_contactMailCode) => (0001) (Name_contactCity) => (Stuckton) (Name_personalPrimaryName) => (Glue) (tk_nameId) => (39) (Name_contactEmail) => (glitter@glue.com) (accessionNumber) => (2006.1214.3) (Name_personalRestOfName) => (Glitter ) Link Record on line 5 in toydb/Links is kaput Can't find Name in toydb/Names; (tk_nameFunction) => (Source) (Name_corporatePrimaryName) => (Intitute For Broken Bits) (Name_contactEmail) => (sticky@bits.edu) (accessionNumber) => (2006.1214.2) Link Record on line 6 in toydb/Links is kaput Can't find Name in toolkit_test database (Name_contactAddress1) => (Elmer Lane) (tk_nameFunction) => (Source) (Name_contactCountry) => (USA) (Name_contactPhone) => (444-222-0101) (Name_corporatePrimaryName) => (Institute For Sticky Bits) (Name_contactMailCode) => (0001) (Name_contactCity) => (Stuckton) (Name_personalPrimaryName) => (Glue) (Name_contactEmail) => (glitter@glue.com) (accessionNumber) => (2006.1214.2) (Name_personalRestOfName) => (Glitter) Link Record on line 7 in toydb/Links is kaput +++ Links: File 'Links.sql' 3 OK 3 FAILED
The script reads links information from toydb/Links. Each record/line in the csv table connects an accession with a name. Each line/record in toydb/Links is expected to
The script prints error messages if any of the conditions are violated. In the example the first error complains that the link defined on line 5 in toydb/Link refers to an accession with an unknown number. The second error complains about the name not corresponding to anything in toydb/Names, 'Intitute For Broken Bits' is misspelled. The third error is a follow up error. Since the record from line 7 in toydb/Names was not ingested because it contains person and corporate information, it can not be found in the toolkit's database.
The script writes an INSERT sql statement to Links.sql that will connect the toolkit's accession with the given name if (and only if) the a link seems to be 'good'. It uses the tk_nameFunction value from the Links table to determine the function of this name. Allowed values are "Source", "Creator", iand "Subject".
Accession numbers in Link tables are split into 4 parts using '.' as deliminator. For eample 2006.1214.2 is interpreted as
accessionNumber1 '2006'
accessionNumber2 '1214'
accessionNumber3 '2'
accessionNumber4 ''
"accessionDate", "accessionNumber", "title", "extentNumber", "extentType", "acquistionType", "agreementReceived", "agreementSent", "inventory", "processingPlan", "accessRestrictions", "restrictionsApply", "rightsTransferred", "cataloged", "catalogedNote", "accessionDispositionNote", "conditionNote", "description"and the following name fields;
"Name_corporatePrimaryName", "Name_corporateSubordinate1", "Name_corporateSubordinate2", "Name_contactAddress1", "Name_contactCity", "Name_contactRegion", "Name_contactMailCode", "Name_contactCountry", "Name_contactPhone", "Name_contactEmail", "Name_personalPrimaryName", "Name_personalRestOfName", "Name_contactAddress1", "Name_contactCity", "Name_contactRegion", "Name_contactMailCode", "Name_contactCountry", "Name_contactPhone", "Name_contactEmail"To adjust the script edit the lists
my @accession_fields my @names_fields