Artifact Structure Best Practices and Considerations
The first thing you should do to prepare for artifact import to the SEARCCH hub is to determine how many of which artifact types you have. The SEARCCH hub supports the following types of artifacts:
Note: the hub does not presently support a first class "experiment" type artifact because these do not typically exist as individual components. Publications (e.g., conference papers, dissertations, master's theses) typically document experiments, which are realized using various software components, software configuration files, and datasets. Because of this, experiments typically live across multiple artifacts. If you have a complete, self-contained experiment artifact package, you can submit it as an "other" artifact.
You may have one artifact or multiple artifacts. For example, you may have software and data that only work together. In this case, publishing the two as one combined artifact makes sense. If the software and data are useful individually, then importing as two separate artifacts and linking them through a supported relationship would be advantageous. We expect a common case will be submitting four related artifacts: software, data, a paper, and a presentation.
Software artifacts can be implementations of research algorithms described in publications; tools used in experiments (e.g., traffic generators, testbed or experiment control programs); experiment setups; system images (e.g., a virtual machine or docker image); and the like.
Generally speaking, a software repository will be a single artifact. Typically, a singular source file should not be submitted as an artifact although there could be rare cases where this makes sense. An RPM, DEB, or MSI installer file could also be considered a single artifact in some circumstances. However, it is usually better in such cases to package individual files with a README that explains what the file is and how to use it.
Experiment setups are a special case of software artifacts. They typically consist of a set of programs or scripts and configuration files (e.g., ns files). Experiment setups can be packaged with research software or separately, if desired. If an experiment setup could be useful to other researchers in the future, separate packaging may make sense.
Datasets can be inputs to algorithms used in experiments (e.g., ML model training data, PCAP data), interim outputs from processing (e.g., trained models), or final outputs (e.g., algorithm detections). Data items such as these are important to experiment reproducibility and validation. A dataset artifact could also be a repository of multiple related datasets.
Sometimes datasets are stored in the same repository as software, but there may be good reason to create separate code and data artifacts in the hub. There is no need to refactor your repository in such a case. The hub supports users creating two artifacts of different types that both point to the same URL. Simply create a software artifact and a dataset artifact that each have the same primary URL.
Publications can be papers, technical reports, dissertations, masters' theses, etc. Papers describe research, experiments, and evaluations that were performed using software and datasets. Publications artifacts will typically be a single PDF formatted file but could be a pointer to a web page that displays the publication contents.
Presentations are slides, audio, and/or video discussions of research efforts and results. These artifacts will typically be a single file but could two or more files (e.g., an audio or video presentation plus the slides as a separate file.)
The "other" artifact type is for situations where you may have some unique artifact that does not fit into the software, dataset, publication, or presentation categories. For example, if you have a website that pulls together all artifacts associated with the same experiment, you could create an "other" artifact using the URL, separately import each related artifact, and then link them all together.