Perl efficiency

Static · 12-03-2003 4:48pm #1

Hey,

I'm having a problem deciding on what data structures to use. I'm writing this thing in perl (don't ask why) and basically the data is made up completely of integers... and I need a data structure that'll allow me to do lookups....

I have a list of 'entries' so to speak. Each entry is an integer value, which is paired with an array of integer values. Eg.

1 -> 1,3,4,5
2 -> 1,4,5,5,6
2 -> 2,5,2,2
3 -> 1,3
..
...

and so on. The 'array' of any given entry is of an arbitrary length.

The values on the left are not unique, ie, I could have any number of entries with 2. This kinda rules out the use of a basic hash (needing unique keys)..

Basically, I want a very efficient way of storing this data, and retrieving it. i.e. Based on the example above, I'd like to be able to search by left hand side 2, and get two entries:

2 -> 1,4,5,5,6
2 -> 2,5,2,2

I'd also like to be able to do searching on the right hand side, and not necessarily on the whole array, eg, if I do a search for 4
on the right hand side, I get:

1 -> 1,3,4,5
2 -> 1,4,5,5,6

Now, I can implement this without much effort, but I could have tens of thousands of these entries, with again, an abitrary number of r.h.s. values. Anybody got any ideas on how best to design this data structure (or search algorithms on) so it's *fast*?
Again, I'd like to avoid using string comparisons, etc.

cheers,

barry.

Zaltais · 12-03-2003 6:01pm

First off why are you using perl

Secondly, why are you designing your own datastructure? Why not use a DB with only 2 columns - first column is your l.h.s. second column is the r.h.s.

Only other option I can think of is a hash of arrays for the basic structure, and you're going to have to stringify your r.h.s, unless you want to use a hash of arrays of arrays to access each individual item....

so in a system like this:

1 -> 1,3,4,5
2 -> 1,4,5,5,6
2 -> 2,5,2,2
3 -> 1,3

your access (using a hash of arrays) would be

$data{'1'}[0] gives a string "1,3,4,5"
$data{'2'}[0] gives a string "1,4,5,5,6"
$data{'2'}[1] gives a string "2,5,2,2"
$data{'3'}[0] gives a string "1,3"

Can't think of any way to make this searchable on the r.h.s without building an inverse index though....

Mind you I could be talking sh*te either, and someone will give you a decent answer....

nahdoic · 12-03-2003 8:34pm

well it looks like your sorted for searching on the LHS, that's gonna be fast no matter what.

I guess you're worried about the RHS. If you have to search through each element on the RHS when you're doing a search that's gonna take a while, and not be too efficient.

I think you're asking for the impossible to be honest. If you need it to be really that fast and efficienct for the RHS, then build an inverse look up table and make it so. And an inverse look up table wouldn't really be that much of an overhead either.

Static · 12-03-2003 9:11pm

Zaltais, either using the db and hash of arrays still results in 'stringification'... the comparisons are going to be too much of an overhead..

I don't have much of a choice in the implementation language (for now

)

I'll have a look into inverse lookup tables.

Thanks for the suggestions!

Zaltais · 13-03-2003 9:13am

Other option to ensure that the r.h.s. is not stringified is to use a hash of arrays of arrays.

So to expand on the earlier example in a system like this:

1 -> 1,3,4,5
2 -> 1,4,5,5,6
2 -> 2,5,2,2
3 -> 1,3

your access (using a hash of arrays of arrays) would be

$data{'1'}[0][0] gives 1
$data{'1'}[0][1] gives 3
$data{'1'}[0][2] gives 4
$data{'1'}[0][3] gives 5
$data{'2'}[0][0] gives 1
$data{'2'}[0][1] gives 4
$data{'2'}[0][2] gives 5
$data{'2'}[0][3] gives 5
$data{'2'}[0][4] gives 6
$data{'2'}[1][0] gives 2
$data{'2'}[1][1] gives 5
$data{'2'}[1][2] gives 2
$data{'2'}[1][3] gives 2
$data{'3'}[0][0] gives 1
$data{'3'}[0][0] gives 3

If you needed to access a whole array in one go it still needs to be stringified, but at least you're only stringifying for output, rather than for searching or indexing.

$result = join(',', @{$data{'1'}[0]});

so $result would be 1,3,4,5

Perl efficiency

Comments