Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

learning java



Derek Martin wrote:

>>* I have a list of several million strings, but I know that there are only 
>>about 100k unique ones.  I want to make myself a frequency table that 
>>tells me how often each string occurred.  Write me a data structure to do 
>>it.  Everybody makes a hashmap, which is fine, but most people end up 
>>creating several million Integer objects when it can be done by only 
>>creating 100k.
> 
> 
> Not knowing Java, I couldn't write the code to do it; but the obvious
> solution seems to be to create a hash with the string as a key, and
> the frequency would be the value.  I'm having trouble imagining an
> approach which would produce 1M hash objects and provide a correct
> answer...  What's the deal?

What makes it tricky is two things about Java and its standard libraries:

1. You can only put objects in Java collections, not primitive data 
types. Java provides "wrapper" object types for the primitive types 
(Boolean, Byte, Character, Double, Float, Integer, Long, Short, String), 
largely for that reason.

In current versions of Java, this is a pain, because you have to 
explicitly box the primitive types when you put them in the collection 
(using a constructor), and cast (since Java has only generic 
collections) and unbox (using a method of Integer or whatever) when you 
take them out. Java 1.5 introduces templated collections, like C++, and 
automatic boxing and unboxing of primitive types; that will simplify the 
code, but the efficiency cost remains.

2. The wrapper objects provided by Java are immutable. You can set their 
values only in their constructors.

If you put Integers in your hash table, you will have to create a new 
object each time you change the value. If you want to avoid that 
overhead, you instead have to create a new class of object, say 
MutableInteger, that has a set() method or has the int as a public 
instance variable. In a real application, this comes more naturally; you 
will usually want to store other data in addition to the count of each 
string, so you create a class to hold all of it.

Immutability of objects is sometimes a good thing. For instance, if you 
have a class, all of the members of which are immutable objects, you can 
safely do shallow rather than deep copies of that object. If the member 
objects can be modified, you have to do a deep copy if you clone your 
object. Here, however, immutability gets in your way, forcing you to 
create an extra class rather than using one from the library.




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org