Please enable JavaScript.

Coggle requires JavaScript to display documents.

Data Stewardship plans, DMHub, DMPonline, DMHub, DMPonline, RightField,…

- - - - Will you use Pre-existing data (including Other People’s Data)?
        
        Reference data can be1. Core resources like UniProt or PDB2. Things like a “human reference Genome” that you use to define how your data differs
        
        What existing (non-reference) data sets will you use?
        
        Do you need to Harmonize different sources of existing data
        
        Do you know what data already exists?
        
        Will you use any data that needs to be made computer readable first?
        
        (1.11) What/how/who will integrate existing data
        
        (1.11.1) Will you need to add data from literature?
        
        (1.11.2) Do you need integrate or to link to a different type of data?
  - - - Foreach data format you will be using?
        
        Is this a standard format?
        
        Does this format enable sharing and long term archiving?
        
        Will you be converting to a format more suitable for archiving later? (G F6)
        
        What volume of data of this format do you expect?
      - 5ba53879-eb48-47f2-a73b-f7f7d83bf030
        Will you be using new types of data?
        
        Do you need to create vocabularies or ontologies for any of your data items?
        
        How will you design the format for your data?
        
        Will you describe your data format for others?
      - How will you be storing Metadata? Explore!
        
        Do suitable “Minimal Metadata About” standards exist for you?
        
        Do you know how and when you will be collecting the necessary metadata?
        
        Did you consider Re-usability of your data beyond your original purpose?
        
        Do you need to exchange your data with others?
        
        Did you consider how to monitor data Integrity?
        
        How will you make sure data are what they should be?
        
        Will you Keep checksums of certified/verified/correct/canonical data (G F2)
        
        Will you Define ways to detect file/sample swaps, e.g. by measuring something independently (G F5)
        
        Does all data have a license?
        
        Will you store licenses with the data?
        
        How will you keep provenance?
        
        How will you do file naming and file organization?
        
        Agree on a SOP for naming files (G F4)
        
        How will you handle file versioning
        
        How will you ensure consistent usage of the file naming?
        
        Are all metadata that is in the file names also available in the proper metadata?
    - - Which experimental data will you collect
      - How many subjects do you need to be able to get statistically meaningful results?
      - Selection of analysis technique
      - Which database will you use to store the data?
      - Are there any data format considerations?
        
        (1.15.1) What is the volume of each anticipated data set
        
        (1.15.2) What data formats do the machines yield
        
        (1.15.3) What preprocessing is needed
      - Are there potential issues regarding data ownership and access control?
        
        (1.16.1) Who needs access?
        
        (1.16.2) Where will servers be placed?
        
        (1.16.3) What level of data protection is needed
        
        (1.16.4) What will the IP situation be?
        
        (1.18.4.1) Who will decide about opening up data? e.g. after the project finishes
      - How do you take care of quality control of data capture?
        
        Are you logging what happens exactly to samples?
        
        Will different collection sites be using comparable protocols, formats and identifiers?
        
        Harmonize?
      - Will your data be able to answer your scientific question?
    - - Is the data capture equipment and protocol completely standardized?
      - Is special care needed to get the raw data ready for processing?
    - - Questionnaires?
      - Case report forms?
      - Electronic patient records?
      - Specify a list of data sets
  - - - Are you using a Virtual Research Environment for compute and data sharing?
      - Will you need a shared working space to work with your data?
        
        How will you work with your data?
        
        What kind of data will be in your workspace?
        
        Do you need the storage close to compute capacity?
        
        Will you keep data in work format that is different from archival?
        
        What is the capacity profile? Will you need the same storage quantity during the whole project?
        
        Will you need to temporarily archive data sets (to tape?)
        
        If you will be starting with a high volume of data, how will that initial data come in?
        
        How will project partners access the work space?
        
        How available must the workspace be?
        
        What is the acceptable risk for “total loss”?
        
        Can all files in the workspace be recomputed quickly?
        
        Is there software in the workspace?
        
        What percentage of time should the data be available? During work hours? Nights? In weekends?
        
        How will you do Backups and other Copy data management?
        
        Do you need to backup any data stored elsewhere related to your project in your workspace?
        
        If not: Are all data from all project members adequately backed up and traceable?
        
        It access control to the files in the working area well arranged?
        
        Make sure to give write access only to people who need it
        
        Make sure to give read access only to people that are explicitly allowed, especially when privacy sensitive data are involved
        
        Is there a process in place for offboarding leaving project members?
        
        Removing access
        
        Is there a process in place for onboarding new project members?
        
        Giving access
        
        Instructing about responsibilities and accountability
      - Developing Workflows: Has this been arranged or is more guidance desired?
        
        Will you be running a bulk/routine workflow, or develop a research analysis?
        
        What data will workflow developers use? Can workflow developers work with subset of new data? Is there pre-existing data available for this?
        
        List existing software you will use in the analysis workflow.
        
        List new software components you will develop for the analysis workflow
        
        Did you choose the workflow engine?
        
        What features do you need?
        
        How is Integrity of the tools in the workflow guaranteed (G H3)
      - Running Workflows
        
        How will you make sure to know what exactly has been run?
        
        How do you validate the integrity of the results?
        
        Will you run part of the data set repeatedly to catch unexpected changes in results
      - e87ef779-2c4b-4c0d-a1f1-821290123a3c
        Compute Capacity Planning
        
        Determine needs in Memory/CPU/IO ratios
        
        Suitable system
        
        Grid/Cluster/Cloud?
        
        Data Transport needed?
        
        Do you have in house experience with the used compute architectures?
        
        Do your people need training for Grid/Cloud/Hadoop?
        
        Use shared infrastructure?
        
        Purchase special needs?
        
        Is all Compute capacity needed available close to the working storage?
        
        If not, you need to plan the necessary network capacity
        
        If not, Can the data legally be transported to the compute capacity?
        
        Will different groups work on different parts of the workflow, and will parts of the computing be done on local infrastructure?
        
        Is there sufficient network capacity?
        
        Do groups have local infrastructure that can be used?
      - Is the Risk of information loss / leaks / vandalism acceptable?
        
        Is any of the data privacy sensitive?
        
        Do project members store data or software on computers in the lab or external hard drives connected to those computers?
        
        Do people carry data with them?
        
        Are researchers using cloud accounts?
        
        Are data or reports sent over e-mail or other messaging services?
        
        Do the data centers where data is stored have Certifications?
        
        Are all project web services used via https?
        
        Project members have been instructed?
        
        Did you do an impact analysis?
        
        Information loss?
        
        Information leak?
        
        Information vandalization?
  - - - For each data type you use, answer the following
        
        How is the data structured?
        
        Answer: Differently
        
        Answer 4: a domain specific format for this data? (VCF)
        
        Answer 0: A simple table (data records) for each data set
        
        Answer 2: Complex data, like graph.
      - Will you use a workflow e.g. with tools for database access or conversion?
      - Will you use a Linked data approach?
        
        Will you use linked data sources?
        
        Will you make your results semantically interoperable data?
  - - - What kind of repository?
        
        Answer: self-hosting
        
        Answer: in a domain-specific repository
        
        Answer: in a repository provided by your institute
        
        Answer: in a national repository?
      - Will you be adding this data set to a catalogue?
    - - Who will pay for open access publishing?
      - Who keeps data access running? Recurring fees?
    - - 553834d6-ff71-4b76-b4b4-b90d19a3f0a4
        Will data formats be upgraded if they grow obsolete?
      - 5a192c70-d824-49d2-965c-dca90deb04ac
        Will storage media be upgraded if they grow obsolete?
    - - Is there an open source license
      - Where will it be available?
      - Will it be listed in a catalogue?
    - - What will the IP be like?
      - How will you maintain it?
      - How will the release schedule be?
      - xref: reuse of existing reference data
  - - - Are there legal reasons why (some of your) data can not be completely open?
        
        Privacy reasons?
        
        IP reasons
        
        Will you be using authenticated access?
      - 85a9d872-3d41-4560-82c4-b850a6e2d5ac
        Are there business reasons why some of your data can not be completely open? Patents?
      - c10f9098-5b1c-4abc-adaa-bdef2fb537ca
        Are there other reasons?
      - Will you use a limited embargo period?