After studying dozens of similar questions on SO, I still couldn't figure out this one.

In a browser, when I login into a website https://www.exmaple.com/

I will have access to https://www.exmaple.com/reports-listing

where the actual data will be loaded between the iframe tags

<iframe name="ReportContentFrame" src="javascript:'<html></html>';"> Actual data table! </iframe>

The iframe src is rendered and is also accessible on the url bar once logged in. https://www.exmaple.com/EmbeddedReport.aspx? ...[long list of parameters]

The actual data will then be loaded on the browser.

Here is my python snippet trying to access the actual data

import requests
from bs4 import BeautifulSoup

login_url = 'https://www.exmaple.com/'
report_url = 'https://www.exmaple.com/reports-listing'
data_url = 'https://www.exmaple.com/EmbeddedReport.aspx? ...[long list of parameters]'

#login
s = requests.Session()
payload = {'login_form_name': <login_name>, 'passwd_form_name': <pass>}
req = s.post(login_url, payload)

So far so good. req.text shows I'm successfully logged in.

#now venture to report_url with the same session.
req = s.get(report_url) #passing cookies=req.cookies.get_dict() makes no difference

print BeautifulSoup(req.content).find("iframe", {"name":"ReportContentFrame"})

The protected content on /reports-list is displayed correctly but the actual data is not loaded. Now when trying to access data_url directly with the same session

req = s.get(data_url)

req.text shows I'm not logged in!

Now I suspect ASP.NET's __RequestVerificationToken, __VIEWSTATE, __EVENTVALIDATION hidden inputs might be lost in translation but I have no clue how ASP.NET works. Shouldn't they be contained in the session cookies and passed on faithfully?

Going back to the snippet

#login
s = requests.Session()
payload = {'login_form_name': <login_name>, 'passwd_form_name': <pass>}
req = s.post(login_url, payload)

# print 2: two dynamic __RequestVerificationToken on login_url html.
print len(BeautifulSoup(req.content).find_all("input", {"name":"__RequestVerificationToken"}))

req = s.get(report_url)
# print 3: three dynamic __RequestVerificationToken on report_url html.
print len(BeautifulSoup(req.content).find_all("input", {"name":"__RequestVerificationToken"}))

req = s.get(data_url)
#print 0: 
print len(BeautifulSoup(req.content).find_all("input", {"name":"__RequestVerificationToken"}))
#__VIEWSTATE, __EVENTVALIDATION first appear here but data access is already denied.
__viewstate = BeautifulSoup(req.content).find_all("input", {"name":"__VIEWSTATE"})[0].attrs['value']
__eventvalidation = BeautifulSoup(req.content).find_all("input", {"name":"__EVENTVALIDATION"})[0].attrs['value']

All three variables are dynamic. So how do I access the html content for data_url?

Thank you.

Related posts

Recent Viewed