MobyGames Scraper. Anybody Interested?

dustind900

Member
Supporter
RL Member
I am currently making this for myself, but if others are interested I would be willing to share.
Would there be any interest in such an app? If so, what would be some features you would like to see?

This scraper will be aimed at HyperLaunch specifically. By that I mean scraped info will be added ( optionally ) to HyperLaunchHQ Database xml files and or HyperPause GameInfo ini files.

I see two ways of doing this. One would be adding the info to the ini and xml files and setting the xml desription tag to match the ini section. Or two, name the ini section to match the xml database name. This way when using HyperPause the GameInfo will always show, not having to rely on parsing game names in the ini files.

And considering there are pre-existing ini files a feature will be added to modify existing files instead of creating new ones.

And yes, artwork too...

So, if you are interested at all, drop a post a leave me some suggestions.
 

brolly

Administrator
Developer
I'd be interested on something like this, not so much for creating databases, but would be useful to double check existing info or to add extra information to the databases in the future since we have plans to add some extra fields to them.
I think it would be better to make the scrapped info public instead of the actual scrapper though because this could lead to hundreds of users scrapping moby and hitting their servers badly and that would surely piss them off. Or at least keep it under a restricted group of users.
If you scrap all the relevant data then your files can be used to generate any other format needed.
 

dustind900

Member
Supporter
RL Member
Thank you for some input on this. If you don't mind me asking what are the extra fields you are thinking of adding?

I myself think that all the gameInfo should be included in the DB and the DB should be converted to ini no need for two seperate files, but thats just me. I can work with anything.. :)
 

brolly

Administrator
Developer
Database format is irrelevant really, this will vary from frontend to frontend and as it is HyperLaunch doesn't even have its own database format even though we are considering building one. Ini files isn't really a good format for databases as it doesn't allow to have multiple levels for instance and this might come handy on some situations.

We had a draft made with the new fields a long time ago, I'd need to search for it, but stuff like developer,publisher,num.players,license,critic ratings,a proper genre system with multiple levels or even going further as having the ability to relate the same game across multiple platforms and so on. Most of this info is available on Moby so it's just a matter of scrapping it all. We have plenty of ideas for the future.
I'm happy to work with you through this if you have interest, I can't be doing any actual code though since I already have plenty on my plate.
 

dustind900

Member
Supporter
RL Member
Yeah, I'm down. I will scrape a system to an XML file based off your post above and submit it to you later. Then we can go from there. Moby seems to be lacking in the rating department though.
 

dustind900

Member
Supporter
RL Member
Here is a example xml section I made. I know its not the best, but I figured I should get the file format down before I get too crazy. XML is not a strong point of mine.

Code:
<game name="10-Yard Fight (USA, Europe)" index="true" image="1">
	<cleanName>10-Yard Fight</cleanName>
	<description>
		The main idea of this game is that players take control of a football team and have the task of trying to score a touchdown before the clock runs out.
		Players start out as a high school team. Games consists of two halves. One touchdown must be scored before time runs out in each half or the game is lost. After a touchdown is scored, the half ends and the 2nd half of the game starts. If the player scores in both halves, they win the game, and then move on to the next level of difficulty. Difficulty levels after high school go from college, pro, then to super. The higher the difficulty, the less amount of time is given for a player to score a touchdown.
		Since the game is concerned with scoring touchdowns, in the 1 player game the player is always on offense. Players can score points for completing passes and for gaining yards by either rushing or passing. Scoring a touchdown also adds to the players score, as does any remaining time on the clock after the touchdown is scored. Also after a touchdown, the player can add on to their score by kicking an extra point. 
		Other rules on the field include getting a 1st down to add time to the clock. If the player throws an interception or go four downs without getting a 1st down, they are penalized yards.
		In a 2 player game, the second player plays defense until it's their turn to play offense.
	</description>
	<cloneof></cloneof>
	<crc>3D564757</crc>
	<publisher>Nintendo of America Inc.</publisher>
	<developer>Irem Corp.</developer>
	<released>
		<month>Oct</month>
		<year>1985</year>
	</released>
	<genre>
		<1>Sports</1>
		<2>FootBall</2>
	</genre>
	<perspective>Top-Down</perspective>
	<platforms>Arcade, MSX, NES</platforms>
	<players>1-2</players>
	<multiplayer>Same/Split-Screen Multiplayer</multiplayer>
	<media>Cartridge</media>
	<country>United States</country>
	<advertising>
		<box>
			You're the quarterback in this amazingly real football game!
			Enjoy realistic grid-iron action as you move your team up and down the field to victory! Run, pass, kick, punt - you call the plays in this true-to-life football game. Play against the computer, or against a friend, for hours of real football action. The sights, the sounds and the plays are so real, you'll think you're right on the fifty yard line!
		</box>
	</advertising>
	<reviews>
		<1 name="Just Games Retro">
			Back in 1985, Irem released this game, which has the distinguished honor of being the very first football game for the NES. It fit in with the simplistic, arcade versions of every popular sport that the Big N was pushing out during the console's early year, despite not simply being named "Football." Of course, then it would have to be "American Football," which wouldn't make such a great title.
		</1>
		<2 name="HonestGamers">
			10-Yard Fight was relevant at one time. It was, after all, one of the first football games on NES (if not the first). However, even within the system's life cycle, the genre evolved so much that this particular game is now redundant. By 1991, we youngsters had Tecmo Super Bowl and its customizable playbook, exchangeable players, smarter computer opponents, and solid mechanics to keep us company. Very few of us even entertained the possibility of playing 10-Yard Fight after that point. If nothing else, the game's lack of standout characteristics saw to that…
		</2>
	</reviews>
	<rating>HSRS - GA (General Audience)</rating>
	<enabled>Yes</enabled>
</game>

Found some inconsistencies with release dates, but other than that the information seems rather easily scraped. I am unsure on how to handle "multiple levels." Am I close with my example?
 
Last edited:

brolly

Administrator
Developer
Looks good, I actually prefer the Gamefaqs genre system, but for Moby do it like that works fine. Instead of adding sequential genre inner tags maybe use the headers for the tag instead:
<genre>
<Genre>Sports</Genre>
<Perspective>Top-Down</Perspective>
<Sport>Football (American)</Sport>
</genre>

Headers might differ, Theme might used instead of Sport for instance, check:
http://www.mobygames.com/game/amiga/out-of-this-world/release-info

Don't worry too much about the format a converter can be written to take your files and convert them to something else later on if that's needed. Important thing is that all the info is there.

You might want to add the URL for the moby game page in there too so it can be back-referenced (or the game ID only since the URL can be built from it).
Also add the MobyScore and MobyRank ratings.
Not sure how you are parsing the Specs tab, but unless you're doing it dynamically you might have issues because the content on that tab can vary from system to system.
The "Part of the Following Groups" section could be scrapped as well (add both the name and URL) so the games can be grouped together.
I'm not particularly interested on having the reviews there (but doesn't hurt having them of course) would probably be more interested on seeing the Releases info there listing the countries where the game was released.
 

ghutch92

New member
Supporter
RL Member
VIP
This looks promising. For me adding the "part of the following groups" is important. maybe something like
Code:
<groups>
   <1> group name one</1>
   <2>group name two</2>
</groups

also it would be nice if you had added a mobygames url. using brolly's example
Code:
<mobygames.com>http://www.mobygames.com/game/amiga/out-of-this-world</mobygames.com>

I prefer to use tags and labels rather than a hierarchical genre system like gamefaqs. To me adding genres for games that cross genres would be easier using a tag or label system. The frontend can always group them using filters if you need a hierarchy. In other words I like the way you did it dustin.
 

Epsilon

New member
This would be a lot of help when adding somewhat unsupported or recently supported systems that doesn't have established xml's.
Personally most interested in the genre tags. As I use those alot to setup genres for individual systems in hyperspin.
 

brolly

Administrator
Developer
Problem with using numbered inner tags in the XML (1,2,3,etc.) is that this won't allow you to create a XSD to validate your XML files against a schema so you should avoid doing it that way. This applies to both the genres and reviews tags.
 

dustind900

Member
Supporter
RL Member
Thanks for the input guys. I was getting pretty far into the parsing code when I realized there is a much more efficient way to do things... JQuery! Using JQuery if the site ever changes its layout all I have to do is change the selectors to match the correct elements. Say goodbye to parsing messy html responses. Hopefully if work isn't too crazy this week I'll have everything coded by this weekend.

edit:
Let me know If I have everything from the main page that you want...
  1. title
  2. publisher
  3. developer
  4. release date
  5. other platforms
  6. genre
  7. perspective
  8. sub-genre
  9. moby rank
  10. moby score
  11. moby game URL
  12. description - needs parsed and cleaned - contains the following:
    • description
    • groups
    • trivia
    • alt. titles
    • others?
Just let me know what other info you want added. This is going much faster now that I switched my methods.
 
Last edited:

brolly

Administrator
Developer
Maybe add "The Press Says" as well, keeping review scores from the press might prove to be useful to rate games.

From the other pages I think besides Specs and Rating Systems, Releases would also be useful even if only the country names.
Are you thinking about grabbing boxart too?
 

ghutch92

New member
Supporter
RL Member
VIP
number of players, if multiplayer what kind of multiplayer (local, online, both), languages, and please allow for multiple genres, and sub-genres/themes.
 

dustind900

Member
Supporter
RL Member
@brolly
you mean the image urls or the actual images themselves?

@dogway
mobygames. Unfortunately they dont have an api, so parsing is a little messy. I am currently also working on scraping from thegamesdb, and the archiveVG ( if I can get an api key???). Then Gamefaqs is next...
 

dustind900

Member
Supporter
RL Member
Here is an actual scrape output. Still need to clean a few more items from the main tab, and add the other tabs. I'll probably go back to using a xml file after I get everything sorted out.

Code:
Title = Donkey Kong

Publisher = Nintendo of America Inc.

developer = Ikegami Tsushinki Co., Ltd.|Nintendo Co., Ltd.

released = Jun|1986

genre = Action

perspective = 3rd-Person Perspective|Platform

sub_genre = Arcade

platforms = Amstrad CPC|Apple II|Arcade|Atari 2600|Atari 7800|Atari 8-bit|Coleco Adam|ColecoVision|Commodore 64|Game Boy Advance|Intellivision|MSX|Nintendo 3DS|PC Booter|TI-99/4A|TRS-80 CoCo|VIC-20|Wii|Wii U|ZX Spectrum

rating = 

moby_rank = 65

moby_score = 3.4

misc = 

boxart = http://www.mobygames.com/game/nes/donkey-kong/cover-art/gameCoverId,206582/

URL = http://www.mobygames.com/game/nes/donkey-kong

Alternate Titles = "Donkey Kong-e",e-Reader title|"DK",Common abbreviation|"???????",Japanese spelling

description = Released in the arcades in 1981, Donkey Kong was not only Nintendo's first real smash hit for the company, but marked the introduction for two of their most popular mascots: Mario (originally "Jumpman") and Donkey Kong. Donkey Kong is a platform-action game that has Mario scale four different industrial themed levels (construction zone, cement factory, an elevator-themed level, and removing rivets from girders) in an attempt to save the damsel in distress, Pauline, from the big ape before the timer runs out. Once the rivets are removed from the final level, Donkey Kong falls, and the two lovers are reunited. From there, the levels start over at a higher difficulty. Along the way, Mario must dodge a constant stream of barrels, "living" fireballs, and spring-weights. Although not as powerful as in other future games, Mario can find a hammer which allows him to destroy the barrels and fireballs for a limited amount of time. Additionally, Mario can also find Pauline's hat, purse and umbrella for additional bonus points. Donkey Kong is also notable for being one of the first complete narratives in video game form, told through simplistic cut scenes that advance the story. It should also be noted that in many conversions of the original coin-op game for early 1980's consoles and computer-systems, Donkey Kong only used two or three of the original levels, with the cement factory most often omitted.

Can't figure out how to get the Japanese Font physically into a file. I can see it fine in the msgbox, but in a file it's all question marks ( "?????????". )
 
Last edited:

ghutch92

New member
Supporter
RL Member
VIP
Here is an actual scrape output. Still need to clean a few more items from the main tab, but everything seems to be going smoothly.

Code:
Title = Donkey Kong

Publisher = Nintendo*of*America*Inc.

developer = Ikegami*Tsushinki*Co.,*Ltd.|Nintendo*Co.,*Ltd.

released = Jun|1986

genre = Action

perspective = 3rd-Person*Perspective|Platform

sub_genre = Arcade

platforms = Amstrad CPC|Apple II|Arcade|Atari 2600|Atari 7800|Atari 8-bit|Coleco Adam|ColecoVision|Commodore 64|Game Boy Advance|Intellivision|MSX|Nintendo 3DS|PC Booter|TI-99/4A|TRS-80 CoCo|VIC-20|Wii|Wii U|ZX Spectrum

rating = 

moby_rank = 65

moby_score = 3.4

misc = 

boxart = http://www.mobygames.com/game/nes/donkey-kong/cover-art/gameCoverId,206582/

URL = http://www.mobygames.com/game/nes/donkey-kong

Alternate Titles = "Donkey Kong-e",e-Reader title|"DK",Common abbreviation|"???????",Japanese spelling

description = Released in the arcades in 1981, Donkey Kong was not only Nintendo's first real smash hit for the company, but marked the introduction for two of their most popular mascots: Mario (originally "Jumpman") and Donkey Kong. Donkey Kong is a platform-action game that has Mario scale four different industrial themed levels (construction zone, cement factory, an elevator-themed level, and removing rivets from girders) in an attempt to save the damsel in distress, Pauline, from the big ape before the timer runs out. Once the rivets are removed from the final level, Donkey Kong falls, and the two lovers are reunited. From there, the levels start over at a higher difficulty. Along the way, Mario must dodge a constant stream of barrels, "living" fireballs, and spring-weights. Although not as powerful as in other future games, Mario can find a hammer which allows him to destroy the barrels and fireballs for a limited amount of time. Additionally, Mario can also find Pauline's hat, purse and umbrella for additional bonus points. Donkey Kong is also notable for being one of the first complete narratives in video game form, told through simplistic cut scenes that advance the story. It should also be noted that in many conversions of the original coin-op game for early 1980's consoles and computer-systems, Donkey Kong only used two or three of the original levels, with the cement factory most often omitted.

Can't figure out how to get the Japanese Font physically into a file. I can see it fine in the msgbox, but in a file it's all question marks ( "?????????". )
it probanly has something to do with unicode. look to see if you need to change the encoding on the file you saved to.
 

brolly

Administrator
Developer
You're probably saving the file in ANSI format.
You can easily set what encoding to use in a XML file.
 

lamuelwms

New member
RL Member
Not sure if this will help you much; I know we touched on it briefly dustin that I have developed a plugin based scraper system for RadKade LITE that will allow users to create their own scraping scripts. This is the script we made for MobyGames.com, it scrapes everything and puts it in a nice little object which can be read by the system. If you want full source code to our scraper I would be happy to provide (so far it scrapes 13 different sites) and returns artwork meta data and so forth, let me know. Here is the MobyGames.com script: http://pastebin.com/RWy39cMG

I'll just leave this here:

SUREvmH.png
 
Last edited:
Top