50

A vector space is a set whose elements satisfy certain axioms. Now there are physical entities that satisfy these properties, which may not be arrows. A co-ordinate transformation is linear map from a vector to itself with a change of basis. Now the transformation is an abstract concept, it is just a mapping. To calculate it we need basis and matrices and how a transformation ends up looking depends only on the basis we choose, a transformation can look like a diagonal matrix if an eigenbasis is used and so on. It has nothing to do with the vectors it is mapping, only the dimension of the vector spaces is important.

So it is foolish to distinguish vectors on the way how their components change under a co-ordinate transformation, since it depends on the basis you used. So there is actually no difference between a contravariant and covariant vector, there is a difference between a contravariant and covariant basis as is shown in arXiv:1002.3217. An inner product is between elements of the same vector space and not between two vector spaces, it is not how it is defined.

Is this approach correct?

Along with this approach mentioned, we can view covectors as members of the dual space of the contra-vector space. What advantage does this approach over the former mentioned in my post?

Addendum: So now there are contra variant vectors and their duals called covariant vectors. But the duals are defined only once the contravectors are set up because they are the maps from the space of contra vectors to $R$ and thus, it won't make sense of to talk of covectors alone. Then what does it mean that the gradient is a covector ? Now saying because it transforms in a certain way makes no sense.

Emilio Pisanty
  • 137,480
Isomorphic
  • 1,616

10 Answers10

65

This is not really an answer to your question, essentially because there isn't (currently) a question in your post, but it is too long for a comment.

Your statement that

A co-ordinate transformation is linear map from a vector to itself with a change of basis.

is muddled and ultimately incorrect. Take some vector space $V$ and two bases $\beta$ and $\gamma$ for $V$. Each of these bases can be used to establish a representation map $r_\beta:\mathbb R^n\to V$, given by $$r_\beta(v)=\sum_{j=1}^nv_j e_j$$ if $v=(v_1,\ldots,v_n)$ and $\beta=\{e_1,\ldots,e_n\}$. The coordinate transformation is not a linear map from $V$ to itself. Instead, it is the map $$r_\gamma^{-1}\circ r_\beta:\mathbb R^n\to\mathbb R^n,\tag 1$$ and takes coordinates to coordinates.

Now, to go to the heart of your confusion, it should be stressed that covectors are not members of $V$; as such, the representation maps do not apply to them directly in any way. Instead, they belong to the dual space $V^\ast$, which I'm hoping you're familiar with. (In general, I would strongly discourage you from reading texts that pretend to lay down the law on the distinction between vectors and covectors without talking at length about the dual space.)

The dual space is the vector space of all linear functionals from $V$ into its scalar field: $$V=\{\varphi:V\to\mathbb R:\varphi\text{ is linear}\}.$$ This has the same dimension as $V$, and any basis $\beta$ has a unique dual basis $\beta^*=\{\varphi_1,\ldots,\varphi_n\}$ characterized by $\varphi_i(e_j)=\delta_{ij}$. Since it is a different basis to $\beta$, it is not surprising that the corresponding representation map is different.

To lift the representation map to the dual vector space, one needs the notion of the adjoint of a linear map. As it happens, there is in general no way to lift a linear map $L:V\to W$ to a map from $V^*$ to $W^*$; instead, one needs to reverse the arrow. Given such a map, a functional $f\in W^*$ and a vector $v\in V$, there is only one combination which makes sense, which is $f(L(v))$. The mapping $$v\mapsto f(L(v))$$ is a linear mapping from $V$ into $\mathbb R$, and it's therefore in $V^*$. It is denoted by $L^*(f)$, and defines the action of the adjoint $$L^*:W^*\to V^*.$$

If you apply this to the representation maps on $V$, you get the adjoints $r_\beta^*:V^*\to\mathbb R^{n,*}$, where the latter is canonically equivalent to $\mathbb R^n$ because it has a canonical basis. The inverse of this map, $(r_\beta^*)^{-1}$, is the representation map $r_{\beta^*}:\mathbb R^n\cong\mathbb R^{n,*}\to V^*$. This is the origin of the 'inverse transpose' rule for transforming covectors.

To get the transformation rule for covectors between two bases, you need to string two of these together: $$ \left((r_\gamma^*)^{-1}\right)^{-1}\circ(r_\beta^*)^{-1}=r_\gamma^*\circ (r_\beta^*)^{-1}:\mathbb R^n\to \mathbb R^n, $$ which is very different to the one for vectors, (1).

Still think that vectors and covectors are the same thing?


Addendum

Let me, finally, address another misconception in your question:

An inner product is between elements of the same vector space and not between two vector spaces, it is not how it is defined.

Inner products are indeed defined by taking both inputs from the same vector space. Nevertheless, it is still perfectly possible to define a bilinear form $\langle \cdot,\cdot\rangle:V^*\times V\to\mathbb R$ which takes one covector and one vector to give a scalar; it is simple the action of the former on the latter: $$\langle\varphi,v\rangle=\varphi(v).$$ This bilinear form is always guaranteed and presupposes strictly less structure than an inner product. This is the 'inner product' which reads $\varphi_j v^j$ in Einstein notation.

Of course, this does relate to the inner product structure $ \langle \cdot,\cdot\rangle_\text{I.P.}$ on $V$ when there is one. Having such a structure enables one to identify vectors and covectors in a canonical way: given a vector $v$ in $V$, its corresponding covector is the linear functional $$ \begin{align} i(v)=\langle v,\cdot\rangle_\text{I.P.} : V&\longrightarrow\mathbb R \\ w&\mapsto \langle v,w\rangle_\text{I.P.}. \end{align} $$ By construction, both bilinear forms are canonically related, so that the 'inner product' $\langle\cdot,\cdot\rangle$ between $i(v)\in V^*$ and $w\in V$ is exactly the same as the inner product $\langle\cdot,\cdot\rangle_\text{I.P.}$ between $v\in V$ and $w\in V$, i.e., we have $$ \langle i(v),w\rangle = \langle v,w\rangle_\text{I.P.}. $$ That use of language is perfectly justified.


Addendum 2, on your question about the gradient.

I should really try and convince you at this point that the transformation laws are in fact enough to show something is a covector. (The way the argument goes is that one can define a linear functional on $V$ via the form in $\mathbb R^{n*}$ given by the components, and the transformation laws ensure that this form in $V^*$ is independent of the basis; alternatively, given the components $f_\beta,f_\gamma\in\mathbb R^n$ with respect to two basis, the representation maps give the forms $r_{\beta^*}(f_\beta)=r_{\gamma^*}(f_\gamma)\in V^*$, and the two are equal because of the transformation laws.)

However, there is indeed a deeper reason for the fact that the gradient is a covector. Essentially, it is to do with the fact that the equation $$df=\nabla f\cdot dx$$ does not actually need a dot product; instead, it relies on the simpler structure of the dual-primal bilinear form $\langle \cdot,\cdot\rangle$.

To make this precise, consider an arbitrary function $T:\mathbb R^n\to\mathbb R^m$. The derivative of $T$ at $x_0$ is defined to be the (unique) linear map $dT_{x_0}:\mathbb R^n\to\mathbb R^m$ such that $$ T(x)=T(x_0)+dT_{x_0}(x-x_0)+O(|x-x_0|^2), $$ if it exists. The gradient is exactly this map; it was born as a linear functional, whose coordinates over any basis are $\frac{\partial f}{\partial x_j}$ to ensure that the multi-dimensional chain rule, $$ df=\sum_j \frac{\partial f}{\partial x_j}d x_j, $$ is satisfied. To make things easier to understand to undergraduates who are fresh out of 1D calculus, this linear map is most often 'dressed up' as the corresponding vector, which is uniquely obtainable through the Euclidean structure, and whose action must therefore go back through that Euclidean structure to get to the original $df$.


Addendum 3.

OK, it is now sort of clear what the main question is (unless that changes again), though it is still not particularly clear in the question text. The thing that needs addressing is stated in the OP's answer in this thread:

the dual vector space is itself a vector space and the fact that it needs to be cast off as a row matrix is based on how we calculate linear maps and not on what linear maps actually are. If I had defined matrix multiplication differently, this wouldn't have happened.

I will also, address, then this question: given that the dual (/cotangent) space is also a vector space, what forces us to consider it 'distinct' enough from the primal that we display it as row vectors instead of columns, and say its transformation laws are different?

The main reason for this is well addressed by Christoph in his answer, but I'll expand on it. The notion that something is co- or contra-variant is not well defined 'in vacuum'. Literally, the terms mean "varies with" and "varies against", and they are meaningless unless one says what the object in question varies with or against.

In the case of linear algebra, one starts with a given vector space, $V$. The unstated reference is always, by convention, the basis of $V$: covariant objects transform exactly like the basis, and contravariant objects use the transpose-inverse of the basis transformation's coefficient matrix.

One can, of course, turn the tables, and change one's focus to the dual, $W=V^*$, in which case the primal $V$ now becomes the dual, $W^*=V^{**}\cong V$. In this case, quantities that used to transform with the primal basis now transform against the dual basis, and vice versa. This is exactly why we call it the dual: there exists a full duality between the two spaces.

However, as is the case anywhere in mathematics where two fully dual spaces are considered (example, example, example, example, example ), one needs to break this symmetry to get anywhere. There are two classes of objects which behave differently, and a transformation that swaps the two. This has two distinct, related advantages:

  • Anything one proves for one set of objects has a dual fact which is automatically proved.
  • Therefore, one need only ever prove one version of the statement.

When considering vector transformation laws, one always has (or can have, or should have), in the back of one's mind, the fact that one can rephrase the language in terms of the duality-transformed objects. However, since the content of the statements is not altered by the transformation, it is not typically useful to perform the transformation: one needs to state some version, and there's not really any point in stating both. Thus, one (arbitrarily, -ish) breaks the symmetry, rolls with that version, and is aware that a dual version of all the development is also possible.

However, this dual version is not the same. Covectors can indeed be expressed as row vectors with respect to some basis of covectors, and the coefficients of vectors in $V$ would then vary with the new basis instead of against, but then for each actual implementation, the matrices you would use would of course be duality-transformed. You would have changed the language but not the content.

Finally, it's important to note that even though the dual objects are equivalent, it does not mean they are the same. This why we call them dual, instead of simply saying that they're the same! As regards vector spaces, then, one still has to prove that $V$ and $V^*$ are not only dually-related, but also different. This is made precise in the statement that there is no natural isomorphism between a vector space and its dual, which is phrased, and proved in, the language of category theory. The notion of 'natural' isomorphism is tricky, but it would imply the following:

For each vector space $V$, you would have an isomorphism $\sigma_V:V\to V^*$. You would want this isomorphism to play nicely with the duality structure, and in particular with the duals of linear transformations, i.e. their adjoints. That means that for any vector spaces $V,W\in\mathrm{Vect}$ and any linear transformation $T:V\to W$, you would want the diagram

to commute. That is, you would want $T^* \circ \sigma_W \circ T$ to equal $\sigma_V$.

This is provably not possible to do consistently. The reason for it is that if $V=W$ and is $T$ an isomorphism, then $T$ and $T^*$ are different, but for a simple counter-example you can just take any real multiple of the identity as $T$. This is precisely the formal statement of the intuition in garyp's great answer.

In apples-and-pears languages, what this means is that a general vector space $V$ and its dual $V^*$ are not only dual (in the sense that there exists a transformation that switches them and puts them back when applied twice), but they are also different (in the sense that there is no consistent way of identifying them), which is why the duality language is justified.


I've been rambling for quite a bit, and hopefully at least some of it is helpful. In summary, though, what I think you need to take away is the fact that

Just because dual objects are equivalent it doesn't mean they are the same.

This is also, incidentally, a direct answer to the question title: no, it is not foolish. They are equivalent, but they are still different.

Emilio Pisanty
  • 137,480
26

We expect a vector to change in a certain way when we change the scale we use to measure distance. Consider the vector $$\vec{x}=(1, 0, 0)\,\mathrm{m}$$ If we change scale and now measure in centimeters this vector becomes $$\vec{x}=(100, 0 ,0)\,\mathrm{cm}$$ Now consider a vector representing a force: $$ \vec{F}=(1,0,0)\,\mathrm{J/m}$$ where I've chosen to write J/m for Newtons to remind us that force is the gradient of a potential function. Well, that potential is not going to change because I've changed scales. It's still sitting out in space somewhere, just sitting there. So what does the force vector look like when measured in a cm-based frame? $$\vec{F} = (0.01, 0,0)\,\mathrm{J/cm}$$ That "vector" $\vec{F}$ doesn't transform correctly! But notice that in either frame, calculating the work done by moving an object one meter against a force of one Newton remains the same:$$W=\vec{F}\cdot\vec{x} = (1,0,0)\cdot (1,0,0) = (100,0,0)\cdot (0.01, 0,0) = 1\,\mathrm{J}$$ Quantities defined as gradients belong to a vector space, but it's a different kind of vector space than that which contains distances. Thus we make the distinction covariant (sometimes called cogradient -- "like a gradient" -- especially in older literature) and contravariant (or contragradient -- "opposite of gradient")

In order to get the physically important quantity energy to have the same value regardless of what frame we evaluate it in, we have to recognize that there are two types of vectors, and they must be treated differently upon change of coordinates.

garyp
  • 22,633
8

There are two more points that can be made here. Sorry if I repeat someone.

In a way you are right that if you have a vector space and its dual there is no intrinsic way to say which space is the original and which is the dual. This is because there is a canonical isomorphism between a vector space and the dual of its dual. In other words if $V$ is a vector space and $W=V^*$ its dual, then $W^*=(V^*)^*$ is isomorphic to $V$ (in a canonical way). Hence the pair $V$ and $W$ could be viewed as a vector space $V$ and its dual $W$ or as a vector space $W$ and its dual $V= W^*$.

In the context of a manifold, where usually the words contravariant and covariant vectors appear, you say that you need to define first the tangent space at a point and then its dual the cotangent space before you can talk about one forms, differentials and so on. But this is not the case. It is true that that is the usual way in most books but not the only possible way. If you are an algebraist in spirit you may have seen and prefer the following definition. Let $M$ be a differentiable manifold and $p\in M$ a point. Consider the ring $\mathcal O_p$ of germs of smooth functions at $p$. It is a local ring i.e. it has a unique maximal ideal $\mathcal m_p$, consisting of the germs of functions which vanish at $p$.. Then the ring $\mathcal O_p/\mathcal m_p$ is oviously isomorphic to the field of real numbers. The quotient $\mathcal m_p/\mathcal m^2_p$ is in a natural way a vector space over $\mathcal O_p/\mathcal m_p=\mathbb R$. This is the cotangent space of the manifold at that point, usually denoted by $T^*_pM$. This way you can define the “covectors” without defining first the vectors. The tangent space is then the dual.

MBN
  • 3,875
5
  1. No, it is important to distinguish between covariant and contravariant tensors.

    OP's link mentions differential geometry. If one has only studied those objects in the context of pseudo-Riemannian manifolds $(M;g)$, which comes equipped with an (invertible) metric $(0,2)$ tensor $g$, then the existence of the musical isomorphism may perhaps unnecessarily obfuscate the precise notions of covariant and contravariant tensors in some treatments.

    Thus it is recommended to study this in a bare setting of a manifold $M$ without assuming additional structures, such as, a metric tensor $g$.

    Examples:

  2. In fact, if one is confused about covariant and contravariant tensors, one should first study this in the realm of multi-linear maps of finite dimensional vector spaces $V$ (as opposed to the context of differential geometry and manifolds $M$).

    The above recommendation translates (in the multi-linear setting) into studying multi-linear maps of a finite-dimensional vector space $V$ without assuming additional structures, such as, a (non-degenerate) inner product $\langle \cdot | \cdot \rangle :V\times V \to \mathbb{R}$.

    Of course there is always infinitely many ways to put a (non-degenerate) inner product $\langle \cdot | \cdot \rangle$ on a finite-dimensional vector space $V$, each of which lead to a musical isomorphism: $V\cong V^*$. But the crucial point is that there is no canonical choice of a (non-degenerate) inner product $\langle \cdot | \cdot \rangle$ on $V$.

Qmechanic
  • 220,844
3

The notion of co- and contravariance depends on context: If you wanted to be as clear as possible, you should actually mention with respect to what the components transform co- or contravariantly.

In case of the algebraic dual of finite-dimensional vector spaces, the implied context is a change of basis of the vector space. Then, we can look at how the components of vectors and dual vectors behave with respect to that change.

In case of differential geometry, the implied context is the change of coordinates of the base manifold, which induces a change of basis of the tangent space given by the Jacobi matrix. With respect to that change of basis, the components of tangent vectors transform contravariantly, and the components of cotangent vectors transform covariantly.

It's worth mentioning at this point that tangent and cotangent vectors can be defined independently of their transformation laws and without making use of the duality pairing: Morally speaking (so we do not confuse the issue with technicalities), tangent vectors over a manifold $M$ are equivalence classes of maps $\mathbb R\to M$, whereas cotangent vectors are equivalence classes of maps $M\to\mathbb R$. Both form vector spaces in their own right, and either one can be considered the algebraic dual of the other once you introduce a notion of pairing. But they are distinct geometric objects, and one way to make that distinction explicit is by looking at how their coordinates behave.

Christoph
  • 14,001
1

The co/contra distinction only makes sense when talking about vector fields. Even then the difference only becomes apparent when dealing with curved spaces or at least curvilinear coordinate systems The difference comes from how vectors relate back to the undlying space or manifold on which the fields are defined. Contravariant vectors then are what people dormally think of as vectors. A lot of the formal machinery can be bypassed if you take the notion of a scalar field on a manifold as obvious :). A (contravariant) vector then is something that measures the rate of change of a scalar field a at a point in a given direction. This is formalized by viewing vectors as operators on scalar fields satisfying certain conditions. This view makes contravariant vector fields functions from scalar fields to scalar fields. Covariant vectors (or covectors) then act on vectors to measure their component in a given direction. This makes covector fields functions from vector fields to scalar fields. This is not trivial is that we in are not assuming any metrics, norms, dot products or notions of orthogonality on the vectors or the underlying manifold. Once a metric is introduced we get a natural isomorphism between vectors and covectors, Covariant and contravariant bases are then used to represent the same geometric object as either a vector or a covector. Note that contravariant vectors are represented in terms or covariant bases and vice versa.

0

I will say that the standard definition of vectors and one-forms is not the world's cleanest. A modern definition of vectors would say that a vector space is a mapping from the functions on the space to itself that satisfies the Leibniz rule and is linear (alternately, the vector space is the local linear approximation of the space). Then, the set of one-forms is a linear mapping from the vector space to the space of functions on the tangent space.

0

There are vectors of a vector space(an abstract mathematical entitity). Then for a vector space there are corresponding dual spaces. An element of a dual space maps the element of the vector space to $R$, this number is denoted as $<a,b>$, the inner product. Now, the dual space and the vector space have basis which are related by $<e^i,e_j>=\delta_{ij}$. Now suppose there is a linear transformation of $Av=b$ for the vector space, and A belongs to the dual space and b belongs to $R$. Then, if I decide to choose a new basis for $v$, I have to apply a linear transformation to v which will be a square matrix $B$. Now I write the equation as $A$ $B^{-1}$ $B$ $v$ =$b$. This now gives the linear transformation $A$ in a new basis, given by the matrix $A$ $B^{-1}$ which will again be a row matrix. Now, what happened is that we changed the basis of both the vector space and its dual space such that the condition $<e^i,e_j>=\delta_{ij}$ is maintained and because of the way the dual vector transformed in this case, we called it a covariant vector. But this naming is not universal. It is a relative concept and can vary from situation to situation.

because the dual vector space is itself a vector space and the fact that it needs to be cast off as a row matrix is based on how we calculate linear maps and not on what linear maps actually are. If I had defined matrix multiplication differently, this wouldn't have happened.

Now the basis change transformation that we achieved for the dual space could have been achieved the same way as the vector space itself if we had represented the dual vector as a column vector and separately figured out the basis change and thus the vector would have transformed as $X$ $A^T$ where $A^T$ denotes the dual vector as the column vector.

So, now the dual vector transforms like the contravariant vector itself under basis change transformation.

So the same transformation can be achieved in any way you like contravariant or covariant. A vector is a vector.

Isomorphic
  • 1,616
0

We cannot say that distinguishing between covariant vector and covariant vector is a bit stupid.

However, there is really no need to restrict the physical vector to be covariant or covariant. In fact, any vector, such as volecity, gradient or any other type of vector can be regarded as either covariant or contravariant in the coordinate transformation, although you may find that usually the velocity is refered to contravariant vector, and the gradient is refered to covariant vector.

By definitions, the components of a covariant vector transform obey the law : $$ \overline A_i = \sum_{j=1}^n \frac {\partial x^j} {\partial \overline x^i} A_j \qquad \qquad (1) $$

and the the components of a contravariant vector transform obey the law : $$ \overline A^i = \sum_{j=1}^n \frac {\partial \overline x^j} {\partial x^i} A^j \qquad \qquad (2)$$

There is no restriction for the types of vectors $A_j$ and $A^j$, and we can perform the coordinate transform according to the rules.

The One
  • 11
0

In my pursuit of a second necromancer badge, here's my terse answer.

$x^i$ is a contravariant vector and it transforms as $$ x'^i= \sum_{j=1}^n \frac {\partial x'^i} {\partial x^j} x^j \,.$$ The gradient operator therefore is a covariant vector $$ {\partial \over \partial x^i}'= \sum_{j=1}^n \frac {\partial x^j} {\partial x'^i} {\partial \over \partial x^j} \,.$$ Here you see hat the distinction between contra- and covariance occurs naturally and that it would be a mistake to ignore it.

Amit
  • 6,024
my2cts
  • 27,443