[cse491] hw11 db name loading differences

C. Titus Brown ctb at msu.edu
Wed Nov 12 20:53:38 PST 2008


On Wed, Nov 12, 2008 at 10:25:46PM -0500, Alex Nolley wrote:
-> I'm having some trouble getting the database to recognize movies/shows that
-> are the same as actually being the same. This problem is occurring with the
-> database loading module that Titus gave us. For example, while looking at
-> the database for a good pair of actors to test, I noticed that Austin, Tony
-> (II) and Banner, David (I) were both in Def Jam Fight for NY. However, the
-> db loading script is importing the string 'Def Jam Fight for NY (2004) (VG)
-> (voice)  [Teck]  <73>'  for Austin, Tony (II) and the separate string 'Def
-> Jam Fight for NY (2004) (VG)  (voice)  [Himself]  <23>' for Banner, David
-> (I). Since the strings aren't the same, the database assigns different
-> movie_id's to them, causing my intersecting searches to turn up nothing.
-> 
-> Should we do some extra processing to remove all the information after the
-> title? I can imagine doing a split by '(' and then taking the [0] entry, but
-> what if a movie has '(' in it's title?

A generic solution to this would be much appreciated :).  In the absence
of that, just load the data as-is.

How does IMDB handle it, anyway?!

--t
-- 
C. Titus Brown, ctb at msu.edu



More information about the cse491-fall-2008 mailing list