Index of /toolkit/csvToTk

 NameLast modifiedSizeDescription

 Parent Directory   -  
 DataImport.xhtml 2007-03-12 09:01 9.4K 
 csvToTk 2007-03-12 08:38 14K 
 toydb/ 2010-09-20 03:42 -  



csvToTk is a perl script that converts CSV files to ingest files suitable for import by the Archivist Toolkit.

run csvToTk -h to see the options available

As a courtesy it trims leading and trailing spaces from accession and name properties.


csvToTk the script 
    it requires the DBI module, 
    if it is not installed on your machine get it from cpan 

toydb a toy database example with 2 accessions and 4 names  
      the Accessions table in csv formats  
      the Name table in csv formats  
      the Links table in csv formats connecting accessions with names  


  1. prepare a directory with three files:
    the directory toydb contains example files. File names may not include periods ('.').

    At Georgia Tech we do not use all possible accession/name properties available in the toolkit. To add or remove property fields that are or are not used at your institution you need to edit the script slightly; that is you need to  adjust the field name lists it uses, see FIELD_NAMES

  2. use csvToTk to produce ingest files for the Accessions and Names
    leading and trailing space characters are removed from Accession and Names properties.
  3. import the ingest files into the Toolkit
  4. delete the fake accessions that are generated to create the name information
  5. run csvToTk again to generate an sql file that links accessions with their sources/creators, and subjects
  6. import the sql file into your toolkit database


The above steps are detailed here using the Accession, Names, and Links tables from the toydb, that come with csvToTk. In this example we assume that the Archivit Toolkit's database is called toolkit_test.
  1. toydb/* are comma separated tables describing accessions, names, and the way they are linked. These files can be edited imported and exported by all coomonly available spreadsheet tools. The tables should be self explanatory.
  2. generate ingest files
    > csvToTk -d toydb -A Accesions.txt -N Names.txt
    +++ Accessions: File 'Accesions.txt'    2 OK    0 FAILED
    Name record on line 6 in toydb/Names is kaput, has bad name type:
            (Name_contactAddress1) => (Elmer Lane)
            (Name_contactPhone) => (444-222-0101)
            (Name_contactCountry) => (USA)
            (Name_contactMailCode) => (0001)
            (Name_corporatePrimaryName) => (Institute For Sticky Bits)
            (Name_contactCity) => (Stuckton)
            (Name_nameType) => (person and corporate)
            (Name_personalPrimaryName) => (Glue)
            (Name_contactEmail) => (
            (Name_personalRestOfName) => (Glitter)
    +++ Names:      File 'Names.txt'        4 OK    1 FAILED
    This produces the Accessions.txt and Names.txt files. csvToTk reports an error for record on line 6 in toydb/ Names because it contains person and corporate information.  This record does not become part of the ingest file.
  3. Import Names.txt and Accessions.txt into the toolkit by using the
    Admin|Import|Import Accessions menu item.

    2 accessions are created as a result of ingesting Accessions.txt.  4 names and 4 more accessions are created as a result of ingesting Names.txt. The last 4 accessions have the numbers 9999.000.2/3/4/5. The 2, 3, 4, and 5 correspond to the lines of the dbToys/Names table in which the name for this accession was defined. Creating fake accessions for names is an ugly work around necessary since the toolkit requires that names are ingested only in connection with accessions.

  4. Remove the fake accesions. 
  5. Generate sql to connect accessions to names. In the example we assume that the toolkit's database is called 'toolkit_test':
  6. > csvToTk -d toydb -t toolkit_test -L Links.sql
    Can't find Accession (2006.1214.3) in toolkit database
            (Name_contactAddress1) => (Elmer Lane)
            (tk_nameFunction) => (Creator)
            (Name_contactCountry) => (USA)
            (Name_contactPhone) => (444-222-0101)
            (Name_contactMailCode) => (0001)
            (Name_contactCity) => (Stuckton)
            (Name_personalPrimaryName) => (Glue)
            (tk_nameId) => (39)
            (Name_contactEmail) => (
            (accessionNumber) => (2006.1214.3)
            (Name_personalRestOfName) => (Glitter )
    Link Record on line 5 in toydb/Links is kaput
    Can't find Name in toydb/Names;
            (tk_nameFunction) => (Source)
            (Name_corporatePrimaryName) => (Intitute For Broken Bits)
            (Name_contactEmail) => (
            (accessionNumber) => (2006.1214.2)
    Link Record on line 6 in toydb/Links is kaput
    Can't find Name in toolkit_test database
            (Name_contactAddress1) => (Elmer Lane)
            (tk_nameFunction) => (Source)
            (Name_contactCountry) => (USA)
            (Name_contactPhone) => (444-222-0101)
            (Name_corporatePrimaryName) => (Institute For Sticky Bits)
            (Name_contactMailCode) => (0001)
            (Name_contactCity) => (Stuckton)
            (Name_personalPrimaryName) => (Glue)
            (Name_contactEmail) => (
            (accessionNumber) => (2006.1214.2)
            (Name_personalRestOfName) => (Glitter)
    Link Record on line 7 in toydb/Links is kaput
    +++ Links:      File 'Links.sql'        3 OK    3 FAILED

    The script reads links information from toydb/Links. Each record/line in the csv table connects an accession with a name. Each line/record in toydb/Links is expected to

    The script prints error messages if any of the conditions are violated. In the example the first error complains that the link defined on line 5 in toydb/Link refers to an accession with an unknown number. The second error complains about the name not corresponding to anything in toydb/Names, 'Intitute For Broken Bits' is misspelled. The third error is a follow up error. Since the record from line 7 in toydb/Names was not ingested because it contains person and corporate information, it can not be found in the toolkit's database.

    The script writes an INSERT sql statement to Links.sql that will connect the toolkit's accession with the given name if (and only if) the a link seems to be 'good'. It uses the tk_nameFunction value from the Links table to determine the function of this name. Allowed values are "Source", "Creator", iand "Subject".

    Accession numbers in Link tables are split into 4 parts using '.' as deliminator.  For eample 2006.1214.2 is  interpreted as

              accessionNumber1	'2006'
    accessionNumber2 '1214'
    accessionNumber3 '2'
    accessionNumber4 ''

  7. Import Links.sql This can be done by running mysql from the command line:
       mysql -u <mysql-user> toolkit_test < Links.sql
    or in any MySql adminsistrator tool.
    Make sure you do this step exactly once, since each time Links.sql is imported a new set of links is created. There is no checking whether an aceesion is already linked to a particular name.


By default the script uses the following accession fields;
and the following name fields;
To adjust the script edit the lists
   my @accession_fields
   my @names_fields