1

At my company, we use a distributed pool of virtual machines to run our UI and API tests. These machines are all connected to an onsite server which the pool uses for publishing the test results and outputs.

The problem is our storage on this server is limited, and each day we are producing 500MB - 5GB of reports (csvs, screenshots, txt logs, etc). We would like to preserve these reports for assisting QA in identifying issues, but we end up having to routinely delete large amounts of reports due to the need to free up space.

Recently, we have moved our test scripts and inputs to a Git repo on VSTS. This not only frees up some space on our test server, but also allows for source control.

We want to do the same with the test outputs. The only issue is that the repo for this would be MASSIVE, larger than the tiny local storage allotted to each test machine. And since everything I've found online seems to suggest that each machine would need to have a full copy of the repo in order to push to it, this solution is unworkable.

My question is, how can I go about making this work? Is there a way to push an individual file or collection of files to a VSTS repo without cloning it locally first? I've looked at Git Submodules but I'm unsure at how reliable or stable that would be, since, in order to get this repo to a reasonable size, we would need about 1,500 submodules. Is there a better solution for storing large amounts of test output data?

  • Have you looked at [VFS for Git](https://github.com/Microsoft/VFSForGit)? It was developed by Microsoft for their Windows repository. I don't have any experience using it, but it sounds like it solves a similar problem. The basic idea is that you only need to checkout the files you actually need. – MAV Feb 06 '19 at 17:22
  • 4
    I just cannot imagine saving test results into a source repository as anything but crazy. I just cannot comprehend how that would make sense. You should just grab a normal network file drive and save the test results there. Also enable "remove old builds" in your build server, so that old stuff is removed automatically. – Euphoric Feb 06 '19 at 17:32
  • 2
    I agree with @Euphoric - archiving historical files is not the use case for source control repositories. Those test result files are not going to change over time. – Eric King Feb 06 '19 at 17:54
  • 2
    If all of your source is properly version-controlled, any set of tests should be reproducible by checking out that version and re-running them. – Blrfl Feb 06 '19 at 19:15

1 Answers1

5

I am going to re-phrase your problem slightly.

We would like to capture the test results and store them for later review.

I believe that you should be trying to capture these test results as files with associated metadata, where the metadata allows you to search for and review collections of files based on a number of criteria.

Content Management Is Key

Right now, you are encoding your metadata in the file name, the directory hierarchy, the source server type, and the file system information. If you make your metadata explicit, using document content models, you could then store your test data in a content management system.

A content management system does not have to be a commercial product. For instance, you could use Apache JackRabbit as your API for handling the metadata, along with conventional file system to store the actual files. Open source products, such as Alfresco, use JackRabbit under the hood to provide a useful web-based GUI for queries and reports.

With a good metadata model, it would be possible to issue search queries such as: How often did test "ABC-123" fail in the last six months? or Provide all the test files associated with release 4.2.1.

Git is Not a Fit

A software-configuration management repository is not a good fit. You will still need to encode your metadata in many awkward and idiosyncratic ways. You will also be left with the problem of aging out your data.

BobDalgleish
  • 4,644
  • 5
  • 18
  • 23