[Relativity FAQ] - [Copyright]
original by Michael Weiss and John Baez
In flat spacetime (the backdrop for special relativity) you can phrase energy conservation in two ways: as a differential equation, or as an equation involving integrals (gory details below). The two formulations are mathematically equivalent. But when you try to generalize this to curved spacetimes (the arena for general relativity) this equivalence breaks down. The differential form extends with nary a hiccup; not so the integral form. The differential form says, loosely speaking, that no energy is created in any infinitesimal piece of spacetime. The integral form says the same for a finite-sized piece. (This may remind you of the "divergence" and "flux" forms of Gauss's law in electrostatics, or the equation of continuity in fluid dynamics. Hold on to that thought!)
An infinitesimal piece of spacetime "looks flat", while the effects of curvature become evident in a finite piece. (The same holds for curved surfaces in space, of course). GR relates curvature to gravity. Now, even in Newtonian physics, you must include gravitational potential energy to get energy conservation. And GR introduces the new phenomenon of gravitational waves; perhaps these carry energy as well? Perhaps we need to include gravitational energy in some fashion, to arrive at a law of energy conservation for finite pieces of spacetime?
Casting about for a mathematical expression of these ideas, physicists came up with something called an energy pseudo-tensor. (In fact, several of 'em!) Now, GR takes pride in treating all coordinate systems equally. Mathematicians invented tensors precisely to meet this sort of demand--- if a tensor equation holds in one coordinate system, it holds in all. Pseudo-tensors are not tensors (surprise!), and this alone raises eyebrows in some circles. In GR, one must always guard against mistaking artifacts of a particular coordinate system for real physical effects. (See the FAQ entry on black holes for some examples.)
These pseudo-tensors have some rather strange properties. If you choose the "wrong" coordinates, they are non-zero even in flat empty spacetime. By another choice of coordinates, they can be made zero at any chosen point, even in a spacetime full of gravitational radiation. For these reasons, most physicists who work in general relativity do not believe the pseudo-tensors give a good local definition of energy density, although their integrals are sometimes useful as a measure of total energy.
One other complaint about the pseudo-tensors deserves mention. Einstein argued that all energy has mass, and all mass acts gravitationally. Does "gravitational energy" itself act as a source of gravity? Now, the Einstein field equations are
            G_{mu,nu} = 8pi T_{mu,nu}
Here G_{mu,nu} is the Einstein curvature tensor, which encodes
information about the curvature of spacetime, and T_{mu,nu} is the
so-called stress-energy tensor, which we will meet again below.  T_{mu,nu}
represents the energy due to matter and electromagnetic fields, but
includes NO contribution from "gravitational energy".  So one can argue
that "gravitational energy" does NOT act as a source of gravity.  On the
other hand, the Einstein field equations are non-linear; this implies that
gravitational waves interact with each other (unlike light waves in
Maxwell's (linear) theory).  So one can argue that "gravitational energy"
IS a source of gravity. 
In certain special cases, energy conservation works out with fewer caveats. The two main examples are static spacetimes and asymptotically flat spacetimes.
Let's look at four examples before plunging deeper into the math. Three examples involve redshift, the other, gravitational radiation.
Despite this success, Einstein's formula remained controversial for many years, partly because of the subtleties surrounding energy conservation in GR. The need to understand this situation better has kept GR theoreticians busy over the last few years. Einstein's formula now seems well-established, both theoretically and observationally.
It's time to look at mathematical fine points. There are many to choose from! The definition of asymptotically flat, for example, calls for some care (see Stewart); one worries about "boundary conditions at infinity". (In fact, both spatial infinity and "null infinity" clamor for attention--- leading to different kinds of total energy.) The static case has close connections with Noether's theorem (see Goldstein or Arnold). If the catch-phrase "time translation symmetry implies conservation of energy" rings a bell (perhaps from quantum mechanics), then you're on the right track. (Check out "Killing vector" in the index of MTW, Wald, or Sachs and Wu.)
But two issues call for more discussion. Why does the equivalence between the two forms of energy conservation break down? How do the pseudo-tensors slide around this difficulty?
We've seen already that we should be talking about the energy-momentum 4-vector, not just its time-like component (the energy). Let's consider first the case of flat Minkowski spacetime. Recall that the notion of "inertial frame" corresponds to a special kind of coordinate system (Minkowskian coordinates).
Pick an inertial reference frame. Pick a volume V in this frame, and pick two times t=t_0 and t=t_1. One formulation of energy-momentum conservation says that the energy-momentum inside V changes only because of energy-momentum flowing across the boundary surface (call it S). It is "conceptually difficult, mathematically easy" to define a quantity T so that the captions on the Equation 1 (below) are correct. (The quoted phrase comes from Sachs and Wu.)
  Equation 1:  (valid in flat Minkowski spacetime, when Minkowskian
                coordinates are used) 
                                               t=t_1
       /                  /                    /
       |                  |                    |
       | T dV     -       | T dV       =       | T dt dS
       /                  /                    /
      V,t=t_0           V,t=t_1               t=t_0
   p contained       p contained            p flowing out through
   in volume V    -  in volume V       =    boundary S of V
   at time t_0       at time t_1            during t=t_0 to t=t_1
   (Note: p = energy-momentum 4-vector)
T is called the stress-energy tensor.  You don't need to know what
that means! ---just that you can integrate T, as shown, to get
4-vectors.  Equation 1 may remind you of Gauss's theorem, which deals
with flux across a boundary.  If you look at Equation 1 in the right
4-dimensional frame of mind, you'll discover it really says that the
flux across the boundary of a certain 4-dimensional hypervolume is
zero.  (The hypervolume is swept out by V during the interval t=t_0
to t=t_1.)  MTW, chapter 7, explains this with pictures galore.  (See
also Wheeler.)
A 4-dimensional analogue to Gauss's theorem shows that Equation 1 is equivalent to:
  Equation 2:  (valid in flat Minkowski spacetime, with Minkowskian
                coordinates)
       coord_div(T) = sum_mu (partial T/partial x_mu) = 0
We write "coord_div" for the divergence, for we will meet another
divergence in a moment.  Proof?  Quite similar to Gauss's theorem: if
the divergence is zero throughout the hypervolume, then the flux
across the boundary must also be zero.  On the other hand, the flux
out of an infinitesimally small hypervolume turns out to be the
divergence times the measure of the hypervolume.
Pass now to the general case of any spacetime satisfying Einstein's field equation. It is easy to generalize the differential form of energy-momentum conservation, Equation 2:
  Equation 3:  (valid in any GR spacetime)
        covariant_div(T) = sum_mu nabla_mu(T) = 0    
                    (where nabla_mu = covariant derivative)
(Side comment: Equation 3 is the correct generalization of Equation 1 for
SR when non-Minkowskian coordinates are used.)
GR relies heavily on the covariant derivative, because the covariant derivative of a tensor is a tensor, and as we've seen, GR loves tensors. Equation 3 follows from Einstein's field equation (because something called Bianchi's identity says that covariant_div(G)=0). But Equation 3 is no longer equivalent to Equation 1!
Why not? Well, the familiar form of Gauss's theorem (from electrostatics) holds for any spacetime, because essentially you are summing fluxes over a partition of the volume into infinitesimally small pieces. The sum over the faces of one infinitesimal piece is a divergence. But the total contribution from an interior face is zero, since what flows out of one piece flows into its neighbor. So the integral of the divergence over the volume equals the flux through the boundary. "QED".
But for the equivalence of Equations 1 and 3, we would need an extension of Gauss's theorem. Now the flux through a face is not a scalar, but a vector (the flux of energy-momentum through the face). The argument just sketched involves adding these vectors, which are defined at different points in spacetime. Such "remote vector comparison" runs into trouble precisely for curved spacetimes.
The mathematician Levi-Civita invented the standard solution to this problem, and dubbed it "parallel transport". It's easy to picture parallel transport: just move the vector along a path, keeping its direction "as constant as possible". (Naturally, some non-trivial mathematics lurks behind the phrase in quotation marks. But even pop-science expositions of GR do a good job explaining parallel transport.) The parallel transport of a vector depends on the transportation path; for the canonical example, imagine parallel transporting a vector on a sphere. But parallel transportation over an "infinitesimal distance" suffers no such ambiguity. (It's not hard to see the connection with curvature.)
To compute a divergence, we need to compare quantities (here vectors) on opposite faces. Using parallel transport for this leads to the covariant divergence. This is well-defined, because we're dealing with an infinitesimal hypervolume. But to add up fluxes all over a finite-sized hypervolume (as in the contemplated extension of Gauss's theorem) runs smack into the dependence on transportation path. So the flux integral is not well-defined, and we have no analogue for Gauss's theorem.
One way to get round this is to pick one coordinate system, and transport vectors so their components stay constant. Partial derivatives replace covariant derivatives, and Gauss's theorem is restored. The energy pseudo-tensors take this approach (at least some of them do). If you can mangle Equation 3 (covariant_div(T) = 0) into the form:
       coord_div(Theta) = 0
then you can get an "energy conservation law" in integral form.
Einstein was the first to do this; Dirac, Landau and Lifshitz, and
Weinberg all came up with variations on this theme.  We've said
enough already on the pros and cons of this approach.
We will not delve into definitions of energy in general relativity such as the Hamiltonian (amusingly, the energy of a closed universe always works out to zero according to this definition), various kinds of energy one hopes to obtain by "deparametrizing" Einstein's equations, or "quasilocal energy". There's quite a bit to say about this sort of thing! Indeed, the issue of energy in general relativity has a lot to do with the notorious "problem of time" in quantum gravity.... but that's another can of worms.
References (vaguely in order of difficulty):