Edit: Imagine one would get the data from an Earth-like planet through an observation (satelite etc.) at a given moment, would current weather-models allow to compute how the weather was before that time?
There is a concept called backtracking, which is (for example) used to calculate where particulates have been first emitted (so we measure some plume, and calculate this originated 18 hours ago at a particular source). But this assumes knowledge of present and past winds, and is therefore different from what you ask.
Reversing the time axis would involve water falling up to the clouds and cyclones moving equatorward and turning into potential vorticity. It seems involved and I'm not sure what benefit would be. I don't think this can be done with existing models, and it would be a lot of effort to make it work.
¹ It's not as simple as that, because the full "actual state" also involves modelling to "fill in the gaps" in time and space, between all the times and places where we have measurements. This is known as re-analysis.
Well, if we can do that, then surely we could also solve in the inverse time direction, by considering the equation $$ \frac{\partial}{\partial t} u(-t) = -D(u(t)) $$ and running the integrator with $\tilde t = -t$, $\tilde D = -D$?
Actually, you quickly run into problems when you try that. The operator $D$ can be characterised by its Jacobian, which basically tells you how pertubations in the state influence the derivative. Specifically, the complex eigenvalues of the Jacobian can tell you whether a small deviation will a) amplify over time (positive real part), or b) decay (negative real part), or c) just oscillate (purely imaginary).
For physical systems the eigenvalues tend to be mostly c) or b): you get a lot of wave-like solutions which propagate / oscillate over the system, and tend to decay over time. a) however is more tricky: if you start with a small deviation from the start state, the system will over time deviate ever more and and more. Now, this kind of thing is by no means unheard of especially in meteorology; it's the essence of a chaotic system. Storms can emerge and grow stronger over time, but only by scooping up energy that's already stored in the system. At some point they'll stop.
OTOH, you always have a lot of consistently negative real-part eigenvalues. These correspond to dissipative effects: small-scale pertubations generally are smoothed out to zero by the physical effects, e.g. winds have friction, mixing of air of different temperature averages out the differences, etc.. If you now run the simulation backwards, you turn those negative real parts into positive real parts, and that means the system is suddenly massively chaotic on all length scales. Small pertubation arise out of numerical uncertainties, and grow over all bounds. You would not only end up with states different from the actual weather a week ago, but with states that are completely unlike anything the weather has ever been like – huge, erratic temperature fluctuations and small vortices with crazy wind speeds.
The reason why is because of the Butterfly Effect. In a system where the current state depends on the previous state in an iterative way, you can get chaotic effects. Chaotic effects can magnify extremely tiny inputs to gigantic changes over time.
This was first noted by the excellent mathematician and meteorologist Edward Lorenz. This is a decent explanation of how he came to notice that the equations predicting the weather are extremely sensitive to current conditions. You simply can't build a computer with enough sensitivity to do a good job.
Since tiny fluctuations can cause huge effects over time, you have to ask yourself - how much information can your simulation encompass? Lorenz showed that tiny things can change the entire landscape over time. To be accurate, a simulation would have to take into account every source of small changes - sunspot activity, the wobble of the moon, the gravitational tug from Pluto...the list is endless.
So unfortunately for a chaotic system like weather, you can't predict with any accuracy previous or future states.
In the case of predicting past values from current values the same architecture can be used, as it can learn the backward model as directly as a forward model.