Discussion:
[Tockit-general] Relational algebra
Peter Becker
2003-08-15 06:42:15 UTC
Permalink
Hello all,

some of you might be interested: I just started the package
org.tockit.relations in the Tockit Java branch. The idea is to implement
a rather complete (although not necessarily efficient) implementation
for relational algebra. At the moment I moved bits of the Tupleware
model into it, operations (selection, projection, union, crossproduct,
intersection...) will follow. The model allows attaching names to the
dimensions, but this is optional. The sources as used in Tupleware with
their UI parts might follow later on -- we probably should clean up some
bits first.

In case you have specific interests in this area feel free to ask for
features -- they might be easy to do while I am working on this topic
anyway.

Peter
J. Hereth Correia
2003-08-15 07:33:03 UTC
Permalink
Post by Peter Becker
Hello all,
some of you might be interested: I just started the package
org.tockit.relations in the Tockit Java branch. The idea is to implement
a rather complete (although not necessarily efficient) implementation
for relational algebra. At the moment I moved bits of the Tupleware
model into it, operations (selection, projection, union, crossproduct,
intersection...) will follow. The model allows attaching names to the
dimensions, but this is optional. The sources as used in Tupleware with
their UI parts might follow later on -- we probably should clean up some
bits first.
In case you have specific interests in this area feel free to ask for
features -- they might be easy to do while I am working on this topic
anyway.
Peter
Hi Peter,

I think it's good to work on those basic structures. I still hadn't the
time to look into tupleware, so I have only two very general ideas. The
first one is about naming - make it usable, i.e. allow for selecting
columns by name, provide the interface to ask for all column names.
Another point would be the domain of each column, e.g. to know that in
the 'PLZ' column only postal codes are allowed. So far, I have no idea
how this could be implemented, maybe for the start simply a string with
a description, which might be also an URI identifying a set.

About the operations, i'll have to wait until I see what you have done.
Of course, I'd like to see the operations introduced by Peirce:

Permutation: Reordering of columns, makes no difference when using names

Juxtaposition: another name for cross product

Join: select two columns of a relation. The resulting relation has all
tuples that have the same values in those two columns and these two
columns are *removed*

Negation: The complement of the relation. neg(R) is the cross product of
the domains of the columns minus R.

Teridentity: a constant, the set of all triples (x,x,x) for any x.



So far, so Peirce ;-)

And finally, I'd like to have transitive closure for binary relations.


That's all :-)

Cheers, keep on the good work

Jo
Post by Peter Becker
-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Tockit-general mailing list
https://lists.sourceforge.net/lists/listinfo/tockit-general
--
J. Hereth Correia <***@mathematik.tu-darmstadt.de>
Darmstadt University of Technology, Department of Mathematics
Peter Becker
2003-08-15 09:06:01 UTC
Permalink
Post by J. Hereth Correia
Post by Peter Becker
Hello all,
some of you might be interested: I just started the package
org.tockit.relations in the Tockit Java branch. The idea is to implement
a rather complete (although not necessarily efficient) implementation
for relational algebra. At the moment I moved bits of the Tupleware
model into it, operations (selection, projection, union, crossproduct,
intersection...) will follow. The model allows attaching names to the
dimensions, but this is optional. The sources as used in Tupleware with
their UI parts might follow later on -- we probably should clean up some
bits first.
In case you have specific interests in this area feel free to ask for
features -- they might be easy to do while I am working on this topic
anyway.
Peter
Hi Peter,
I think it's good to work on those basic structures. I still hadn't the
time to look into tupleware, so I have only two very general ideas. The
first one is about naming - make it usable, i.e. allow for selecting
columns by name, provide the interface to ask for all column names.
That is something I consider to be application-specific. Tupleware
handles the column names (at the moment I call them "dimension names",
but I don't really like it). It gets them either from the first line
(text file/CLI input), or the query variable names (SQL/RDQL). But the
library should keep them optional IMO.
Post by J. Hereth Correia
Another point would be the domain of each column, e.g. to know that in
the 'PLZ' column only postal codes are allowed. So far, I have no idea
how this could be implemented, maybe for the start simply a string with
a description, which might be also an URI identifying a set.
I don't want to go down this road yet (but probably later). In Java I'd
probably use what I call "dynamic RTTI" -- I'd attach a Class[] to the
relation and check the Object[] in the tuples to be instances
elementwise. If you want other mappings, e.g. to SQL types, XSD types or
something else, you can wrap that in Java types (the SQL mappings are
actually part of JDBC and the XSD mappings is something I want to do for
Siena).
Post by J. Hereth Correia
About the operations, i'll have to wait until I see what you have done.
Permutation: Reordering of columns, makes no difference when using names
Do you think I should combine that with projection? I haven't made up my
mind about the interface for projection yet -- I think it should get an
int[] as parameter, denoting the dimensions to leave in. Should they be
assumed to be in order or do we allow implicit permutations? In the
latter case we could treat permutation as a projection onto as many
dimensions as the input. Alternatively one could give dimensions to
drop. Or even this:

public class ProjectionOperator implement UnaryRelationOperator {

public static ProjectionOperator getPositiveOperator(int[]
dimensionsToPick);
public static ProjectionOperator getNegativeOperator(int[]
dimensionsToDrop);
public static ProjectionOperator getNegativeOperator(int
dimensionToDrop);
public static ProjectionOperator getPermutationOperator(int[]
dimensionsToPermute);

// non-factory code

}
Post by J. Hereth Correia
Juxtaposition: another name for cross product
Can we call it crossproduct then?
Post by J. Hereth Correia
Join: select two columns of a relation. The resulting relation has all
tuples that have the same values in those two columns and these two
columns are *removed*
Sure. I forgot to enlist that one, but of course it should be there.
Post by J. Hereth Correia
Negation: The complement of the relation. neg(R) is the cross product of
the domains of the columns minus R.
That is more tricky since it involves understanding what the domains
are. Of course we could give the constructor/factory method a Set[] with
the full domains.
Post by J. Hereth Correia
Teridentity: a constant, the set of all triples (x,x,x) for any x.
That is not an operator, is it? But we could have some method to get the
teridentity for a particular set. The current model will not be able to
handle the general case since relations are sets of tuples.
Post by J. Hereth Correia
So far, so Peirce ;-)
And finally, I'd like to have transitive closure for binary relations.
Forgot about this one. Any other closures?
Post by J. Hereth Correia
That's all :-)
Since I don't care about efficiency for now these things should be all
easy to do. I try to get a rather complete set.

Peter
J. Hereth Correia
2003-08-15 09:55:02 UTC
Permalink
...
Post by Peter Becker
Post by J. Hereth Correia
Hi Peter,
I think it's good to work on those basic structures. I still hadn't the
time to look into tupleware, so I have only two very general ideas. The
first one is about naming - make it usable, i.e. allow for selecting
columns by name, provide the interface to ask for all column names.
That is something I consider to be application-specific. Tupleware
handles the column names (at the moment I call them "dimension names",
but I don't really like it). It gets them either from the first line
(text file/CLI input), or the query variable names (SQL/RDQL). But the
library should keep them optional IMO.
Yes, it should be optional, I agree. But if the columns are named it
should be possible to apply all operations with using names, not caring
for order.

...
Post by Peter Becker
Post by J. Hereth Correia
About the operations, i'll have to wait until I see what you have done.
Permutation: Reordering of columns, makes no difference when using names
Do you think I should combine that with projection?
No, it is unusual to consider permutation a projection. The projection
should _not_ change the order of existing columns (independent from the
order in which the indices are given).
Post by Peter Becker
I haven't made up my
mind about the interface for projection yet -- I think it should get an
int[] as parameter, denoting the dimensions to leave in. Should they be
assumed to be in order or do we allow implicit permutations? In the
latter case we could treat permutation as a projection onto as many
dimensions as the input. Alternatively one could give dimensions to
public class ProjectionOperator implement UnaryRelationOperator {
public static ProjectionOperator getPositiveOperator(int[]
dimensionsToPick);
public static ProjectionOperator getNegativeOperator(int[]
dimensionsToDrop);
public static ProjectionOperator getNegativeOperator(int
dimensionToDrop);
public static ProjectionOperator getPermutationOperator(int[]
dimensionsToPermute);
Huh? Looks complicated. Why not

public class Relation {
public Relation project(int[] columnsToPick)
}

this is the usual projection operator. Permutation should be an operator
of its own, it's not a projection.
Post by Peter Becker
// non-factory code
}
Post by J. Hereth Correia
Juxtaposition: another name for cross product
Can we call it crossproduct then?
I think so ;-)
Post by Peter Becker
Post by J. Hereth Correia
Join: select two columns of a relation. The resulting relation has all
tuples that have the same values in those two columns and these two
columns are *removed*
Sure. I forgot to enlist that one, but of course it should be there.
*Both* columns are removed. That's a difference to the Join operation in
SQL.
Post by Peter Becker
Post by J. Hereth Correia
Negation: The complement of the relation. neg(R) is the cross product of
the domains of the columns minus R.
That is more tricky since it involves understanding what the domains
are. Of course we could give the constructor/factory method a Set[] with
the full domains.
Sure, negation is complicated - ask Frithjof ;-) I think this is a very
expensive operation, but it should be there for completenes' sake.
Post by Peter Becker
Post by J. Hereth Correia
Teridentity: a constant, the set of all triples (x,x,x) for any x.
That is not an operator, is it? But we could have some method to get the
teridentity for a particular set. The current model will not be able to
handle the general case since relations are sets of tuples.
Constants are considered to be 0-ary Operations. I think for the
beginning it is useful to consider Relational Algebras over a Set A
which should be given in the constructor.
Post by Peter Becker
Post by J. Hereth Correia
So far, so Peirce ;-)
And finally, I'd like to have transitive closure for binary relations.
Forgot about this one. Any other closures?
Sure, symmetric closure for example. But I think transitive closure is
the most interesting.
Post by Peter Becker
Post by J. Hereth Correia
That's all :-)
Since I don't care about efficiency for now these things should be all
easy to do. I try to get a rather complete set.
Sounds good.

Cheers,

Jo
Post by Peter Becker
Peter
--
J. Hereth Correia <***@mathematik.tu-darmstadt.de>
Darmstadt University of Technology, Department of Mathematics
Peter Becker
2003-08-15 12:20:07 UTC
Permalink
Post by J. Hereth Correia
...
Post by Peter Becker
Post by J. Hereth Correia
Hi Peter,
I think it's good to work on those basic structures. I still hadn't the
time to look into tupleware, so I have only two very general ideas. The
first one is about naming - make it usable, i.e. allow for selecting
columns by name, provide the interface to ask for all column names.
That is something I consider to be application-specific. Tupleware
handles the column names (at the moment I call them "dimension names",
but I don't really like it). It gets them either from the first line
(text file/CLI input), or the query variable names (SQL/RDQL). But the
library should keep them optional IMO.
Yes, it should be optional, I agree. But if the columns are named it
should be possible to apply all operations with using names, not caring
for order.
Hmmm... that would also imply that I'd have to check for uniqueness. Let
me skip this for now, it can always be added later.
Post by J. Hereth Correia
...
Post by Peter Becker
Post by J. Hereth Correia
About the operations, i'll have to wait until I see what you have done.
Permutation: Reordering of columns, makes no difference when using names
Do you think I should combine that with projection?
No, it is unusual to consider permutation a projection. The projection
should _not_ change the order of existing columns (independent from the
order in which the indices are given).
Post by Peter Becker
I haven't made up my
mind about the interface for projection yet -- I think it should get an
int[] as parameter, denoting the dimensions to leave in. Should they be
assumed to be in order or do we allow implicit permutations? In the
latter case we could treat permutation as a projection onto as many
dimensions as the input. Alternatively one could give dimensions to
public class ProjectionOperator implement UnaryRelationOperator {
public static ProjectionOperator getPositiveOperator(int[]
dimensionsToPick);
public static ProjectionOperator getNegativeOperator(int[]
dimensionsToDrop);
public static ProjectionOperator getNegativeOperator(int
dimensionToDrop);
public static ProjectionOperator getPermutationOperator(int[]
dimensionsToPermute);
Huh? Looks complicated. Why not
public class Relation {
public Relation project(int[] columnsToPick)
}
Since the OO-approach is extremely limited. Operations are separate
entities and don't necessarily map onto a particular class of their
domain. Does a matrix * vector operation belong to Matrix or Vector?
What about vector * matrix? I think the idea of putting these operations
into the classes is not too smart. And unless you want to override
things there is no need to do this.

But the real reason is that modelling the operations as types allows us
to combine them. You just group a bunch of them together by
concatenating the operations. Such a thing is another operation object.
If you put the operations into the Relation class you can't do that,
you'd have to pass sequences of operations around somehow -- either as
java.lang.Methods or via some other mechanism to call the right things,
e.g. nasty constants.
Post by J. Hereth Correia
this is the usual projection operator. Permutation should be an operator
of its own, it's not a projection.
Yes, I thought you would say that :-) And I agree, I just wanted to
present the alternative. On the implementation side they might share
code, though.


[...]
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
Join: select two columns of a relation. The resulting relation has all
tuples that have the same values in those two columns and these two
columns are *removed*
Sure. I forgot to enlist that one, but of course it should be there.
*Both* columns are removed. That's a difference to the Join operation in
SQL.
Sounds fair enough. Although you could achieve that effect by the
SQL-join and a projection, while I can't see any easy way to get it the
other way round.
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
Negation: The complement of the relation. neg(R) is the cross product of
the domains of the columns minus R.
That is more tricky since it involves understanding what the domains
are. Of course we could give the constructor/factory method a Set[] with
the full domains.
Sure, negation is complicated - ask Frithjof ;-) I think this is a very
expensive operation, but it should be there for completenes' sake.
Ok. But the operator will need the Set[]. Then it will be expensive, but
not complicated -- negation in a finite domain is not that hard :-)
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
Teridentity: a constant, the set of all triples (x,x,x) for any x.
That is not an operator, is it? But we could have some method to get the
teridentity for a particular set. The current model will not be able to
handle the general case since relations are sets of tuples.
Constants are considered to be 0-ary Operations. I think for the
beginning it is useful to consider Relational Algebras over a Set A
which should be given in the constructor.
Won't work for Tupleware. I need different domains there. And dynamic
domains. But the teridentity would be defined on particular sets.
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
So far, so Peirce ;-)
And finally, I'd like to have transitive closure for binary relations.
Forgot about this one. Any other closures?
Sure, symmetric closure for example. But I think transitive closure is
the most interesting.
How is symmetric closure defined. I guess I know how, but you better
tell me again ;-)

[...]

Peter
J. Hereth Correia
2003-08-16 06:28:04 UTC
Permalink
...

[Deleted all topics we can agree on (wow, so many ;-)]
Post by Peter Becker
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
Join: select two columns of a relation. The resulting relation has all
tuples that have the same values in those two columns and these two
columns are *removed*
Sure. I forgot to enlist that one, but of course it should be there.
*Both* columns are removed. That's a difference to the Join operation in
SQL.
Sounds fair enough. Although you could achieve that effect by the
SQL-join and a projection, while I can't see any easy way to get it the
other way round.
The other wa: Product with Teridentity, join the column of R that you
want to join with a column from S with a column of the Teridentity
first, effectively doubling the column. Now, you can join of these two
columns with the corresponding column in S without loosing the original
values.

If this is easy or not depends on the notation. Peirce had a nice
graphical representation of relations and in this notation the
operations I wrote here are fundamental. In the algebraic notation they
are a bit more complicated, agreed.
Post by Peter Becker
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
Negation: The complement of the relation. neg(R) is the cross product of
the domains of the columns minus R.
That is more tricky since it involves understanding what the domains
are. Of course we could give the constructor/factory method a Set[] with
the full domains.
Sure, negation is complicated - ask Frithjof ;-) I think this is a very
expensive operation, but it should be there for completenes' sake.
Ok. But the operator will need the Set[]. Then it will be expensive, but
not complicated -- negation in a finite domain is not that hard :-)
Just noticed: If we have negation (or union) the algebra is not a
relational algebra anymore. Then it becomes a Krasner algebra (if only
union is involved it's a weak Krasner algebra).

However, for a useful treatment of relations (as in Perices Logic of
Relations), I think Negation is important. I therefore propose to
include it.
Post by Peter Becker
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
Teridentity: a constant, the set of all triples (x,x,x) for any x.
That is not an operator, is it? But we could have some method to get the
teridentity for a particular set. The current model will not be able to
handle the general case since relations are sets of tuples.
Constants are considered to be 0-ary Operations. I think for the
beginning it is useful to consider Relational Algebras over a Set A
which should be given in the constructor.
Won't work for Tupleware. I need different domains there. And dynamic
domains. But the teridentity would be defined on particular sets.
Actually, I have no clear idea how Relational/Krasner algebra are
defined for the multi-set case. But there should be some literature as
they are necessary to provide domain-safe algebras for the relational
query languages.When we run into problems (or into major discussions ;-)
) I'll look things up.

For real relational algebras and weak Kranser algebras this should be
no problem. With negation it becomes one... (negation is hard...).
Post by Peter Becker
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
So far, so Peirce ;-)
And finally, I'd like to have transitive closure for binary relations.
Forgot about this one. Any other closures?
Sure, symmetric closure for example. But I think transitive closure is
the most interesting.
How is symmetric closure defined. I guess I know how, but you better
tell me again ;-)
If you look up any math book about order theory you will find some
Classifications of binary relations like (anti-)symmetric and others. All
positive definitions can be turned into closure operations. For example
symmetrie is defined as:

(a,b) \in R => (b,a) \in R

Then you define the closure by

R_0 := R
R_{i+1}:=R_i \union \{ (b,a) | (a,b) \in R_i \}
R^*:=\bigunion_{i=0}^{\infty} R_i

Of course, for symmetrie we have R^*=R_1, but this is to provide the general
Schema. If you look at transitivity you see that there will be several
iterations.

Wish ya a nice weekend

Jo
Post by Peter Becker
[...]
Peter
-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Tockit-general mailing list
https://lists.sourceforge.net/lists/listinfo/tockit-general
--
J. Hereth Correia <***@mathematik.tu-darmstadt.de>
Darmstadt University of Technology, Department of Mathematics
Peter Becker
2003-08-17 11:14:03 UTC
Permalink
Hi Jo,

I've implemented the model bit and the framework for the operators so
far (interfaces, abstract classes, concatenation) -- the only actual
operator I've implemented so far is the identity. But having the
framework things should be really easy, so I'd like to implement pretty
much everything we can think of and then add some lists of which
operators provide which algebras if taken together. Another option would
be grouping the relations of one algebra into factories, e.g.

public class PeircesAlgebra {
public static RelationOperator getRelationalJoin(int leftHandColumn,
int rightHandColumn);
public static RelationOperator getTeridentity(Set domain);
...
}

If you have a bit of spare time, take a look at the existing framework
-- there is quite a bit of JavaDoc (at least on the interfaces) and the
model has test cases. The tests for the operations will follow once
there are operations to test (testing identity did seem a bit overkill).
Does the code constitute a reasonable approach from the math point of view?

Peter
Post by J. Hereth Correia
...
[Deleted all topics we can agree on (wow, so many ;-)]
Post by Peter Becker
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
Join: select two columns of a relation. The resulting relation has all
tuples that have the same values in those two columns and these two
columns are *removed*
Sure. I forgot to enlist that one, but of course it should be there.
*Both* columns are removed. That's a difference to the Join operation in
SQL.
Sounds fair enough. Although you could achieve that effect by the
SQL-join and a projection, while I can't see any easy way to get it the
other way round.
The other wa: Product with Teridentity, join the column of R that you
want to join with a column from S with a column of the Teridentity
first, effectively doubling the column. Now, you can join of these two
columns with the corresponding column in S without loosing the original
values.
If this is easy or not depends on the notation. Peirce had a nice
graphical representation of relations and in this notation the
operations I wrote here are fundamental. In the algebraic notation they
are a bit more complicated, agreed.
Post by Peter Becker
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
Negation: The complement of the relation. neg(R) is the cross product of
the domains of the columns minus R.
That is more tricky since it involves understanding what the domains
are. Of course we could give the constructor/factory method a Set[] with
the full domains.
Sure, negation is complicated - ask Frithjof ;-) I think this is a very
expensive operation, but it should be there for completenes' sake.
Ok. But the operator will need the Set[]. Then it will be expensive, but
not complicated -- negation in a finite domain is not that hard :-)
Just noticed: If we have negation (or union) the algebra is not a
relational algebra anymore. Then it becomes a Krasner algebra (if only
union is involved it's a weak Krasner algebra).
However, for a useful treatment of relations (as in Perices Logic of
Relations), I think Negation is important. I therefore propose to
include it.
Post by Peter Becker
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
Teridentity: a constant, the set of all triples (x,x,x) for any x.
That is not an operator, is it? But we could have some method to get the
teridentity for a particular set. The current model will not be able to
handle the general case since relations are sets of tuples.
Constants are considered to be 0-ary Operations. I think for the
beginning it is useful to consider Relational Algebras over a Set A
which should be given in the constructor.
Won't work for Tupleware. I need different domains there. And dynamic
domains. But the teridentity would be defined on particular sets.
Actually, I have no clear idea how Relational/Krasner algebra are
defined for the multi-set case. But there should be some literature as
they are necessary to provide domain-safe algebras for the relational
query languages.When we run into problems (or into major discussions ;-)
) I'll look things up.
For real relational algebras and weak Kranser algebras this should be
no problem. With negation it becomes one... (negation is hard...).
Post by Peter Becker
Post by J. Hereth Correia
Post by Peter Becker
Post by J. Hereth Correia
So far, so Peirce ;-)
And finally, I'd like to have transitive closure for binary relations.
Forgot about this one. Any other closures?
Sure, symmetric closure for example. But I think transitive closure is
the most interesting.
How is symmetric closure defined. I guess I know how, but you better
tell me again ;-)
If you look up any math book about order theory you will find some
Classifications of binary relations like (anti-)symmetric and others. All
positive definitions can be turned into closure operations. For example
(a,b) \in R => (b,a) \in R
Then you define the closure by
R_0 := R
R_{i+1}:=R_i \union \{ (b,a) | (a,b) \in R_i \}
R^*:=\bigunion_{i=0}^{\infty} R_i
Of course, for symmetrie we have R^*=R_1, but this is to provide the general
Schema. If you look at transitivity you see that there will be several
iterations.
Wish ya a nice weekend
Jo
Post by Peter Becker
[...]
Peter
-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Tockit-general mailing list
https://lists.sourceforge.net/lists/listinfo/tockit-general
Loading...