You could try to add the same delay to both outputs. The trick is to introduce logic which can not be optimized away but adds a LUT delay.
You are probably familiar with using EXOR gates to conditionally invert a signal.
I would add an EXOR function to both output ports. One EXOR's "control" signal is high the other is not. The control signal into each EXOR gate must be such that it could change. e.g. a register which you can write a one or zero to. You never will do that but the synthesis tools does not know so it has to keep the EXOR gate. It can't optimize it away.
Yesterday I tried to prevent logic from being optimised away using various Xilinx constraints but failed. In the end I used the certain-to-work-method I described above but I used an input pin to make that the non-inverting LUT is not optimised away:
//
// Same delay path for o1 and o2
// where o2 = ~o1
//
module keep (
input clk,
input reset_n,
input never_changes, // Always low
output o1,o2
);
reg [1:0] count;
// Some (arbitrary) test registers
always @(posedge clk or negedge reset_n)
begin
if (!reset_n)
count <= 2'b0;
else
count <= count + 2'b01;
end
/*
This did not work:
XOR2 X1(.I0(count[1]),.I1(1'b0),.O(o1));
// synthesis attribute optimize of X1 is off
XOR2 X2(.I0(count[1]),.I1(1'b1),.O(o2));
// synthesis attribute optimize of X2 is off
*/
// This can't fail: Note that never_changes should be low
assign o1 = never_changes ^ count[1];
assign o2 = ~count[1];
endmodule
This is the result of the outputs after place & route:
