Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try alternative URLs for MNIST dataset download #589

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

f0k
Copy link
Member

@f0k f0k commented Jan 26, 2016

Fixes #587. But it makes things a bit ugly... I tend towards holding back until download availability really becomes a problem.
Note: This will work on Python 2 and 3 on an English Ubuntu, but I don't know if it's portable to other languages or Windows, Mac. Would be interesting to know what Exception exactly urlretrieve('http://foo.bar', 'foo') throws on other systems.

@liebald
Copy link

liebald commented Jan 26, 2016

Instead of building your own redundancy scheme in code, why not simply point to a cloud-based storage location, such as the S3 bucket in your list (or alternatively any Dropbox/Gdrive/CDN)? These should have high enough availability to basically make this a non-issue (as opposed to yann.lecun.com, which seems to be hosted somewhere at NYU, perhaps just a single machine sitting under someone's desk)

@f0k
Copy link
Member Author

f0k commented Jan 26, 2016

why not simply point to a cloud-based storage location, such as the S3 bucket in your list

Well, firstly because we (@ebenolson, actually) pay for S3 and secondly because I'd prefer to include the official and probably most permanent location. Not sure if the costs for S3 would noticeable, but people use the example code as a basis for other experiments and so the link could spread quite a bit.

It's the first time I've seen LeCun's website being down, by the way -- from my (limited) experience, it's highly available. That's why I propose to just leave it the way it is for now.

As a compromise, we could just list the alternative download locations in a comment so they're easy to swap in (this wouldn't help with continuous integration, but at least make it easier for end users).

urlretrieve(source + filename, filename)
except URLError as exc:
if exc.args[-1].args == (-2, 'Name or service not known'):
sources.remove(source) # do not try this source again
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind sources is mutable. I think you might be doing this intentionally, but it's setting off all my Python gotcha alarms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought the comment would explain. But yes, it's on purpose. If the primary location is down when downloading the first file, I don't want it to be tried again for the next three downloads. I didn't call it ugly for no reason ;) We could move the sources list into the outer context to make it clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants