Data Management

  • Analyze

    Multiple tools are available to assist in data analysis. Below are best practices and tools that can be used for statistical analysis and visualization of data.

    Best Practices

    DataONE provides multiple best practice guidelines to help in the analysis of data, including considerations on the compatibility of any data that is being integrated, the description of methods to create derived data products, the documentation of steps used in data processing, and identifying outliers, among other analysis topics.

    Data Output, Statistical Analysis, and Visualization

    The Tulane Business Intelligence and Analytics (BIA) Team supports the University's business intelligence, business analytics, advanced analytics, data warehousing, data integration, self-service analytics, and enterprise planning practice areas, in addition to maintaining the University's data warehouses and various reporting sources and processes.

    Numerous open source and proprietary software packages exist to help with data analysis and visualization. See Technology Services for a complete list of software licensed by Tulane University.

    • R is a free language and environment (R can also be described as open source statistical software) that is available for statistical computing and graphics. Downloads are available for Windows, MacOS, and a variety of UNIX platforms.
    • STATA is a complete, integrated statistical software package that provides everything you need for data analysis, data management, and graphics, produced by StataCorp.
    • SAS is a software suite developed by SAS Institute for advanced analytics, business intelligence, data management, and predictive analytics.
    • SPSS is a software package used for statistical analysis popular in fields including the health sciences and marketing.
    • ArcGIS is a comprehensive set of tools for compiling, visualizing, analyzing, editing, managing, and sharing geographic data. For more information about ArcGIS see the GIS research guide.
    • PolicyMap is a data and mapping tool that enables government, commercial, non-profit and academic institutions to access data about communities and markets across the US. Learn more about PolicyMap with the GIS research guide.
    • NCBI Data Analysis Tools allow researchers to manipulate, align, visualize, and evaluate biological data. Analysis tools are broken down into categories: Literature, Health, Genomes, Genes, Proteins, and Chemicals.
    • SankeyBuilder allows you to automatically build a Sankey diagram. Both free and paid accounts are available online.

    Additional tools are available to assist with digital scholarship. See the digital scholarship webpage and research guide for more information.

    Questions? Email

  • Clean and Organize


    Most data analysis and storage requires data that is digital. Some tools exist that make it easy to create born-digital data sets. For more information, see Prepare and Create Data. However, you will on occasion be required to migrate data from an analog to a digital format. For more information about digitization, see the following guidelines:


    Some digital data will require translation or transcription. Transcription is the process of creating a text document about the contents of a file (such as closed captioning for an audio or video file), while translation is the process of duplicating the data in an alternate language. Transcription can describe using Optical Character Recognition (OCR) to create a raw text file or creating a text file for an audiofile. Translation, in comparison, may be creating a text file in English for a letter originally written in Spanish. More accurate translations and transcriptions will require human intervention, but many machine translation tools are available for free.

    Examples of tools that do one or both:

    • Otter is an AI transcription tool that includes 600 free minutes per month.
    • Google Docs--Voice Typing is a free transcription tool that works well with a variety of languages.
    • Google Translate is a free tool that will transcribe and translate foreign languages. It has a mobile app that will translate texts in real-time. It will also transliterate from non-roman scripts into roman scripts and vice-versa.
    • TraveLang Translating Dictionaries is a free tool that allows cross-searching of multiple translation dictionaries. (

    Quality Control/Edit

    Quality control is the process of assessing the consistency and accuracy of your data and revising as necessary. The specifics of the quality control required of your data will be very dependent on the type of data you are creating. For more information on quality control, see "file organization", "version control", and "documentation and metadata" on the Data Storage page of our libguide.


    Anonymising your data to protect the privacy of study participants is an important step to take before sharing your data publicly. The following are open-source tools that support anonymisation at various levels, depending on your use case:

    • ARX will not only anonymise your data, it will also analyse the output's utility and privacy risks.
    • NLM-Scrubber is a HIPAA compliant, clinical text de-identification tool.


    The description of data is often called creating "metadata" or "data about data."

    Many of the tools listed above in Quality Control can also be used to store this data in a separate file, such as within the same folder as your primary data. You can also imbed metadata directly into your files with tools like Adobe Bridge. Adobe Bridge and ExifTool can harvest data contained in many files by default, such as the size, type of file, date it was created, and GIS information. The DMPTool is also helpful in assessing what metadata should be included. 

    For more information about metadata for research data, see Documentation and metadata.

    Questions? Email

  • Prepare

    Preparation for data collection and analysis improves the efficiency of the research process. Funders often require a data management plan (DMP) when submitting a proposal. The requirements for these plans vary depending on the funder and nature of the research; however, they often address the following questions:

    • What types of data will be collected? E.g. Spatial, temporal, instrument-generated, models, simulations, images, video etc.
    • For each type of data file, what are the variables that are expected to be included?
    • What software programs will be used to generate the data?
    • How will the files be organized in a directory structure on a file system or in some other system?
    • Will metadata information be stored separately from the data during the project?
    • What is the relationship between the different types of data?
    • Which of the data products are of primary importance and should be preserved for the long-term, and which are intermediate working versions not of long-term interest?

    DMPTool is software that uses funder specific templates to help researchers write a DMP. It is available to all Tulane affiliates. The research data management team is available to review and answer any questions related to DMPs.

    This image links to the dmptool website.

    Questions? Email

  • Store and Preserve

    Storage and preservation of data are essential for the success of a project and the reuse of data produced.


    Tulane University Information Technology provides data storage and retention best practices. Storage options include Box and Cypress.


    Use the 3-2-1 rule when backing up your data.

    • Keep a minimum of three copies of your data and files
    • Place at least two copies of different storage media (e.g. external harddrive, local drive)
    • Place at least one copy in a different geographical location (e.g. cloud)

    ​For more detailed best practices, consult DataOne.

    Why should you preserve your data?

    Data preservation allows for the access of data and files over time. At a minimum, this includes the storage of data in a secure location, across multiple locations, and saved in file stable formats that are readable in the future. Ultimately, well preserved data increases the impact of the data and researcher.

    What should be preserved?

    Many elements should be considered when determining whether or not to preserve a file or particular data set. In particular, available space, cost, and reproducability should be disscussed. DataOne offers detailed guidelines to help you determine what should be preserved.

    How to prepare data for preservation?

    The preservation of data is dependent on whether the file format used:

    • Is openly documented (more preservable) or proprietary (less preservable);
    • Is supported by a range of software platforms (more preservable) or by only one (less preservable);
    • Is widely adopted (more preservable) or has low use (less preservable);
    • Is lossless data compression (more preservable) or lossy data compression (less preservable); and
    • Contains embedded files or embedded programs/scripts, like macros (less preservable).

    Courtesy of The University of Illinois Urban-Champaign

    Questions? Email

  • Share and Discover

    Why should you share your data?

    Making your data available is now a requirement from most funders (federal and public) and publishers. Benefits of sharing your data include:

    • Furthering scientific inquiry
    • Transparency and reproducibility
    • Improving your scholarly impact
    • Safeguarding your data

    Share working data:

    Version control software can help you keep track of changes made to the data. These tools also facilitate transparent research, reproducibility, and documentation (see Data Management Research Guide).

    Some example of version control systems that allow for data sharing include:

    Public access policies:

    In 2013 the Office of Science and Technology Policy (OSTP) released a memorandum stating that Federal agencies with more than $100M in R&D expenditures develop plans to make the published results of federally funded research freely available to the public within one year of publication and requiring researchers to better account for and manage the digital data resulting from federally funded scientific research.

    Select overviews of federal public access policies:

    Repositories for sharing and finding:

    Repositories are libraries/warehouses of research datasets often centered around a specific discipline. They allow for the preservation, storage, and discoverability of published data.

    Things to consider when selecting a data repository:

    • What is the required/desired level of access?
    • Where will your data live?
    • Does your discipline have a data repository?
    • Does your funder require the use of a particular repository?
    • Is there HIPPA or privacy restrictions?

    Find a repository to meet your needs:

    Improving access and impact of your data:

    Assigning a doi(digital object identifier) and making your data available under a creative commons license will improve the visibility and impact of your data as it will increase discoverability and access to a wider audience.

    Questions? Email

  • About Us

    Raquel Horlick
    Coordinator for Scholarly Resources,
    Sciences and Engineering

    Howard-Tilton Memorial Library
    Rachel Tillay
    Cataloging & Metadata Librarian
    Howard-Tilton Memorial Library
    Eric Wedig
    Coordinator for Scholarly Resources,
    Social Sciences

    Howard-Tilton Memorial Library
    Laura Wright
    Research Support Librarian
    Matas Library
    Courtney Kearney
    Scholarly Engagement Librarian,
    Physical Sciences and Data Management

    Howard-Tilton Memorial Library