There is a third way, as you said yourself. I think you are mixing up development, testing and deployment. I propose that the whole SDLC be looked at as a whole, first, to understand what it is you are trying to achieve. This is a big topic, but I will do my best to summarise.
TL;DR;
In short, you need to separate:
- your code, from
- the application configuration, from
- the system environment configuration.
Each needs to be independent of each other and suitably:
- version controlled
- tested
- deployable
Longer Version
First, you have an application made up of code and (separate sets of) configuration. This needs to be tested, for both build and intentional function - this is called continuous integration (CI). There are many providers of this service both online and locally - for example CircleCI for a cloud provider that links to your repository and builds and tests whenever you commit. If your repository is on-prem and cannot use a cloud provider, something like Jenkins would be an equivalent. If your application is fairly standard, there is probably an existing Docker image that the CI service can use. If not you will have to create one, or a cluster of such, that your application code and configuration can be deployed to. Correctly configured, you will have a wealth of statistics on the quality of your application code.
Next, once you are satisfied with the functionality and correctness of your application, the codebase should be suitably tagged for a specific release. This build should then be deployed to a test environment. Note that the code will be the same as tested in your CI (provably so, if you have done this correctly), but your configuration may differ. Again some CI providers can offer this step so you can test your deployment of a packaged application and discrete configuration. This stage typically will include user functional testing (for new functionality), as well as automated testing (for known functionality). If the release passes this stage, you have a release candidate for integration testing. You can run the automation tests from another Docker container, depending on your application these may be as large and elaborate as your application itself - indeed, there are some metrics that state testing effort is 1:1 to coding effort (though I am unsure on this myself).
Penultimately, the next step is where you build your (system) environment as if it were production. If you are using Docker in production, this is where you will think of security hardening, network and server optimisaton etc. Your Docker images may be based on those you used in Development (ideally so), but there may be changes for scaling and security, as I said. By now the functional testing of the application should be complete, you are more concerned with security and performance. As per the functional testing, your tests here can be developed, deployed and run from other Docker images. This step used to be horrifically expensive and rarely done as to do so you needed dedicated hardware in place that reproduced production. Today, this is completely viable as you can stand up and tear down the whole environment of almost any scale on demand.
Finally, you have a release that should be production ready with only a small set of configuration deltas from that of your integration testing (IP addresses, database URIs, passwords etc.) Your code base has been tested at least in three different environments at this point and the majority of system configuration at least once.