[cse491] hw11 db name loading differences
C. Titus Brown
ctb at msu.edu
Wed Nov 12 20:53:38 PST 2008
On Wed, Nov 12, 2008 at 10:25:46PM -0500, Alex Nolley wrote:
-> I'm having some trouble getting the database to recognize movies/shows that
-> are the same as actually being the same. This problem is occurring with the
-> database loading module that Titus gave us. For example, while looking at
-> the database for a good pair of actors to test, I noticed that Austin, Tony
-> (II) and Banner, David (I) were both in Def Jam Fight for NY. However, the
-> db loading script is importing the string 'Def Jam Fight for NY (2004) (VG)
-> (voice) [Teck] <73>' for Austin, Tony (II) and the separate string 'Def
-> Jam Fight for NY (2004) (VG) (voice) [Himself] <23>' for Banner, David
-> (I). Since the strings aren't the same, the database assigns different
-> movie_id's to them, causing my intersecting searches to turn up nothing.
->
-> Should we do some extra processing to remove all the information after the
-> title? I can imagine doing a split by '(' and then taking the [0] entry, but
-> what if a movie has '(' in it's title?
A generic solution to this would be much appreciated :). In the absence
of that, just load the data as-is.
How does IMDB handle it, anyway?!
--t
--
C. Titus Brown, ctb at msu.edu
More information about the cse491-fall-2008
mailing list