最初目的只是想看看个人门户查询图书馆借阅书籍的接口,写个还书提醒的程序(因为借的书总是忘记还,过期了图书馆不仅不会提醒还不能直接续借,得付滞纳金……),结果就一发不可收拾,研究了一下 HNU 的个人门户登录逻辑并用 Python 浅浅实现了一下。

⚠️ 警告:本文介绍的研究试验仅供学习交流,请勿用于任何非法用途。

提示:为了保证账号安全性,本文中所有的示例字符串都进行了一些字符的修改。

湖南大学「统一身份认证」截图

研究地址:http://cas.hnu.edu.cn/cas/login,其实是「统一身份认证」,应该是想做一个类似 SSO 的东西。非常神奇的是这个网站直接用 https 访问会出错,但是登录之后又会跳转 https。

按下「登录」之后

审查元素可以看到整个表单的 HTML 代码是这样的:

<form id="fm1" action="/cas/login" method="post" data-bitwarden-watching="1">
    <div class="form-group">
        <p class="error text-left" id="errormsg"></p>
        <div class="input-group">
            <div class="input-group-addon"><img src="assets/images/user.png"></div>
            <input id="username" name="username" class="form-control user-input" tabindex="1" placeholder="职工号/学号" type="text" value="" size="25" autocomplete="off" data-com.bitwarden.browser.user-edited="yes">
            <input type="button" id="sendsms" class="btn btn-primary get-code-btn" value="获取动态口令" style="display:none">
        </div>
    </div>
    <div class="form-group">
        <div class="input-group">
            <div class="input-group-addon"><img src="assets/images/lock.png"></div>
            <input id="password" name="password" class="form-control pwd-input" tabindex="2" onkeyup="CreateRatePasswdReq(this);" placeholder="请输入个人门户密码" type="password" value="" size="25" autocomplete="off" data-com.bitwarden.browser.user-edited="yes">
            <a class="forget-pwd" target="_blank" href="http://cas.hnu.edu.cn/securitycenter/findPwd/index.zf">忘记密码<!--忘记登录密码 ?--> </a>
        </div>
    </div>
    <div class="code" id="kaptcha" style="display: none;">
        <!--
        <div class="col-xs-2 text-left p-r-0 code-text">
            <span>验证码 </span>
        </div>-->
        <div>
            <div class="col-xs-5" style="border:1px solid #777777;">
                <input id="authcode" name="authcode" class="form-control" tabindex="2" style="text-align:center;" placeholder="输入验证码" type="text" value="" size="10" autocomplete="off">
            </div>
            <div class="col-xs-7 p-0 code-img"><img id="yzmPic" onclick="javascript:refreshCode();" class="" src="kaptcha?time=1652268176109">
                <a href="javascript:refreshCode();">
                    看不清<!--看不清 ?--> </a>
            </div>
        </div>
    </div>
    <div class="login-info row">
        <div class="col-lg-12 col-md-12 col-sm-12 col-xs-12  login-button"><button type="button" class="btn btn-block btn-primary login-btn" id="dl">登 录 </button></div>
        <div class="col-lg-3 col-md-3 col-sm-3 col-xs-4 remember-me " style="display:none;">
            <input class="default-checkbox checkbox" type="checkbox" name="rememberMe" value="true" id="rember">
            <label for="rember">记住我 </label>
        </div>
    </div>
    <input type="hidden" name="execution" value="9e4a732f-501e-4dee-8a8f-2e6a9faedb4a_ZXlKaGJHY2lPaUpJVXpVeE1pSjkuUTNaUWVrNXVRMVl6WVhWM2EwMWlWbnAyWkM5MFZHdFdjbUoyU2pFMmFFTk1iSHBEWWxoemMzcDFOMlZEZDNZclZrZElka05MTjBkUFpraDFTR1V6VmxCNVVUTkhMMjVsWXpRNFpUQktOMk4yVTA1VVExWnBUV2RZUzJWUFpVTTBPSEpOUm1sSVJUZ3lOMUJ4Tm1OdldrNW9jRTVsTWtwcFpYVm9PRGRhYlROek5tOVVlVW80ZUhaMGJUVkJhREZpVUdrMEwwbGxlbFZKVkVKS1VUUjFMMUZXZUZSYWRVOVdaVkZTTVVNclIyWXlNVU0wZFVKVlNVRXlRbEE1UVdrNWIzQnVZMjVHVVN0M1JtUmhhMXBNUzA4NGRFdzVTbFpLTlVFMlUwRm9UMVIyV2pCUWIzTkxhRnBGVlVkNVlucFFVVmxJVVcwemJrUlVVRGwwTkRBMlZtVndjQzhyT0c0cmRpdFdXV0pWVm5KUWJFTjNhRFJXVFdKUWFWWjJVM2xMWkhSNVkyRk5XVW8zYUVNelRFWmtVUzlMZEU1QlVqY3paMUUzV0d0VllUZHVWRU5GUkdsb2VXUnZWbVVyVVdaRU5IRm1kbk5aWjFKR05WaExWMUl3YkdFcmJUWlFaMVZhVFhVNVNuSjFPRXN5V0doaEx6WnphekJIVFZvd1N6WXZRVWQ1TTBoM1lUSTBRV3RsVmxWYVdFZHRTUzlJVkVWTVptOUdjMm9yTkdSdk5VdHdkbTlqY25wYVluUnRaRGR3YmxCRVFtUlBWR1YyZEVKS1ZFczVhMVJDYlRGek9VZDVkVEpRYnpsYWNsWTNNVEozVm5CaVZsb3ZMMlpUT0hoR1ZrRkdOM1JKTkRCcGNITmxhRzh4VFd4b2VGZHpOaXMyWm1oR1IyRjNVa1lyY201clkyNTNSSEJsZEVkaGIxSkxkVkZaUW5KTFlscExjaXRYYTBSa1FqazRZalk0ZFVsTWFTc3ZUazFUVlhSTGRUVndUMlVyYWxsUVRHOTNkbEZrUjFOcEsxaEtlbUl6TXpWUVNERXhZelZRVEZkVlMxRk5VbFJqVVZkQ2FtMVhTa1JEZDBkdk5WbHpkV1Z1Y0dsM1UwVkliekZaVjFaNWVUaGliVTVZUW00MWFVTlFUbWQzY1daUk1YQjBPV0p1YzJaNmMwOVdPV0UwU0ZScEsyY3dZWE5LWjBOMGRWWnlkREV5TnpGd1ZFZFpkalJOU0RrMFQzWkZUR1JKVFVsamVtSkhWWGwxV2xoaGVreFJRV3BIT0dKdE5FVk9PVUZKVFVrM2RVVkRWa2tyT0N0SU5VUlNWMFJUWVUxTU5WSkVVemsyUm5nMlVFMXBVbEpyWldoU2JFSk9jM2g1YWs1cFZGQkhha3BKZEVsSGNHTTFhVGxSWVdGR1JrczFjSGRhVTJoaVVXMW1NVkpsUm1NMVRYY3pNMEk1T0N0SlMwTm9OVzlKZWtWQlRHZDBVMGh0TVdGek5WUTBiV3BaVDBVMmJWazBXbU56T0hWeFpUSktWbVV2Y0RsUGEwdGllVlprTldaVmVFUjFOa3RoUTNCWk5YZGpkSG81UkVoSVYyZzBkVXgwYWxaTmFuY3lXa1pWU1VoeUwycHFUWGRpWTFCQ2RrZ3hSMWg1VUhvdmQxY3pUa1EzWW5oQlEyNHZSa1JuV2tKNFNsWlhaVmRXYUhWc1lqQjZRekpSVmpKcVNrdHJURzlqZWxGRk0wSkxaVU52Y0hNMlRtVmhkMWhOV1VodmEzaFVZMHRKTm5FemJHdExjazVLVUhoV1VUaE5NazVCWkZGWk1qVndiRTB3T1RKVGFrZHRiMDFtV25WbFlXSlJNMWg0YkRCc1kwNVhjbnBUYVROblVWTjVha01yVW14d1MyRktXRmhSV1ZkbWJsRkVaRGhFV1ROR1VEWlpZVnAyYWtkR2NVMUZTR1ExY1ZCM2VVTlZObkJPTVZwNlF5dDFWR3BIZEZORk5WcEhiazFpUVhWRGRtZHpjbU51ZFhWMmNHOVphVlZLTUhRMk5sZ3lLMmRCZUd4Qk16Y3hjbkpoUVdoVlYzUXZhVkk1TVVGS2FERmtVRWw0ZWtkdlVVRlNZVE5LV2t3emVqVjZTazF3TjFSdFJGZzBPVm9yT1hCNFNpdEJVbXMxU0hWWWJWcFBhMVY1VmxkTU1TdHlkVXRrTVdoa01FOTZhMmhOTUhnMWVrVmpTbHB6YlZjd0sweFdkRXBYV1UweVpESkJkRWwyWWxOM1YyMUpTME01VVRsc1VuRjZWVk5OS3prNEwzbEZLM2xzYzNoVVpHZEliMUkxZVhKTWRUbHNObE5pVkdOYVV6ZFlVWE41TTB0blJ6bE1iMFJCVmxkMVIzcHFaRFJpVTBoWGVHVkpXWEY1VkVaWGR6TmFWMDlwVTNoYVlXbHNkWFJUUkVvMGVXdHBVV3QwZVVSUU16SkllR2xXYlcxUmJHaHlSMmR6ZG5SdE1WbHpZV0pZVEhOalkxbDFNRXB0VmxCWFF6Sk5RWFZpZDBsUVJUZFJWMkZ3UWxka01qVjZVbXBqZFhRMk1HUjRjRnBrU1hFM1lWb3pZV3hSVkU5SlJFUkdkMU0yTlU1VGJFeHlZMUpVYUZnMVJISmhNRVl5Y0RSUFltUnNSVW93VFhwT1ZWZzBkV3RESzNwNFNFTjFSR1ZxVUhkNmFtdGlVR0ZxU1RGVE1rUmhMMVpRTUhOUWIwcEQuMXNRWGZiNDlvbUJhLW9HYV9lbGpfMTF0WDdHVmJzMzVZb3ZkVEE0eHkweHR5azVWanBLS2VfbW5XSjZkeTdlZEV0SDNfQ3dTTkRYX3dvMFNGRWdEQUE=">
    <input type="hidden" name="_eventId" value="submit">
</form>

整个表单的 id 是 fm1,可以看到其中「登录」按钮组件 id 为dl

文件树

查看文件树,我们大概可以猜到 /cas/js/login/login.js 是负责处理登录操作的代码,pwdYz.js 用来验证密码的强度,security.js 是下面会用到的 RSA 加密算法的代码。

按下这个登录按钮之后会发生什么呢?我们可以看到 login.js 中有这样一段代码:

jQuery.getJSON("v2/getPubKey",function(data){
    Modulus = data["modulus"];
    public_exponent = data["exponent"];
});

// ...

$("#dl").click(function(){
    checkForm();
})

// ...

function checkForm(){
    // ...
    var password = $("#password").val();
    var key = new RSAUtils.getKeyPair(publice_exponent, "", Modulus);
    var reversedPwd = password.split("").reverse().join("");
    var encrypedPwd = RSAUtils.encryptedString(key,reversedPwd);
    $("#password").val(encrypedPwd);
    $("#fm1").submit();
}

当按下 #dl 之后,调用 checkForm() 函数,先是检查输入框是否为空,然后用一些神秘的 RSA 算法将用户输入的密码初步加密了一下,再提交表单发请求。

比较鬼畜的是这里直接改 password 输入框的内容,再用表单提交,这会导致在点击按钮之后密码框的密码突然变成一长串乱码,使得一般的密码管理器总是记下错误的密码……不知道为啥要这样写。

至于这里用 RSA 加密用到的公钥,也就是 Moduluspublic_exponent 两个东西,可以看到是事先向 v2/getPubKey 发请求取得的。

POST 都发了些啥

正常登录之后找到向 /cas/login 发送的 POST 请求,可以看到有效载荷有以下几个:

username: 202108060109
password: 90c4e8b1be56a82f34df187379147d50bc694556c7451e8fd5c2a4047cf22385151cbc698ed84e0562915487e18f2420f4cb6c633149cef07d192a041038a73f
authcode: 
execution: 9e4a732f-501e-4dee-8a8f-2e6a9faedb4a_ZXlKaGJHY2lPaUpJVXpVeE1pSjkuUTNaUWVrNXVRMVl6WVhWM2EwMWlWbnAyWkM5MFZHdFdjbUoyU2pFMmFFTk1iSHBEWWxoemMzcDFOMlZEZDNZclZrZElka05MTjBkUFpraDFTR1V6VmxCNVVUTkhMMjVsWXpRNFpUQktOMk4yVTA1VVExWnBUV2RZUzJWUFpVTTBPSEpOUm1sSVJUZ3lOMUJ4Tm1OdldrNW9jRTVsTWtwcFpYVm9PRGRhYlROek5tOVVlVW80ZUhaMGJUVkJhREZpVUdrMEwwbGxlbFZKVkVKS1VUUjFMMUZXZUZSYWRVOVdaVkZTTVVNclIyWXlNVU0wZFVKVlNVRXlRbEE1UVdrNWIzQnVZMjVHVVN0M1JtUmhhMXBNUzA4NGRFdzVTbFpLTlVFMlUwRm9UMVIyV2pCUWIzTkxhRnBGVlVkNVlucFFVVmxJVVcwemJrUlVVRGwwTkRBMlZtVndjQzhyT0c0cmRpdFdXV0pWVm5KUWJFTjNhRFJXVFdKUWFWWjJVM2xMWkhSNVkyRk5XVW8zYUVNelRFWmtVUzlMZEU1QlVqY3paMUUzV0d0VllUZHVWRU5GUkdsb2VXUnZWbVVyVVdaRU5IRm1kbk5aWjFKR05WaExWMUl3YkdFcmJUWlFaMVZhVFhVNVNuSjFPRXN5V0doaEx6WnphekJIVFZvd1N6WXZRVWQ1TTBoM1lUSTBRV3RsVmxWYVdFZHRTUzlJVkVWTVptOUdjMm9yTkdSdk5VdHdkbTlqY25wYVluUnRaRGR3YmxCRVFtUlBWR1YyZEVKS1ZFczVhMVJDYlRGek9VZDVkVEpRYnpsYWNsWTNNVEozVm5CaVZsb3ZMMlpUT0hoR1ZrRkdOM1JKTkRCcGNITmxhRzh4VFd4b2VGZHpOaXMyWm1oR1IyRjNVa1lyY201clkyNTNSSEJsZEVkaGIxSkxkVkZaUW5KTFlscExjaXRYYTBSa1FqazRZalk0ZFVsTWFTc3ZUazFUVlhSTGRUVndUMlVyYWxsUVRHOTNkbEZrUjFOcEsxaEtlbUl6TXpWUVNERXhZelZRVEZkVlMxRk5VbFJqVVZkQ2FtMVhTa1JEZDBkdk5WbHpkV1Z1Y0dsM1UwVkliekZaVjFaNWVUaGliVTVZUW00MWFVTlFUbWQzY1daUk1YQjBPV0p1YzJaNmMwOVdPV0UwU0ZScEsyY3dZWE5LWjBOMGRWWnlkREV5TnpGd1ZFZFpkalJOU0RrMFQzWkZUR1JKVFVsamVtSkhWWGwxV2xoaGVreFJRV3BIT0dKdE5FVk9PVUZKVFVrM2RVVkRWa2tyT0N0SU5VUlNWMFJUWVUxTU5WSkVVemsyUm5nMlVFMXBVbEpyWldoU2JFSk9jM2g1YWs1cFZGQkhha3BKZEVsSGNHTTFhVGxSWVdGR1JrczFjSGRhVTJoaVVXMW1NVkpsUm1NMVRYY3pNMEk1T0N0SlMwTm9OVzlKZWtWQlRHZDBVMGh0TVdGek5WUTBiV3BaVDBVMmJWazBXbU56T0hWeFpUSktWbVV2Y0RsUGEwdGllVlprTldaVmVFUjFOa3RoUTNCWk5YZGpkSG81UkVoSVYyZzBkVXgwYWxaTmFuY3lXa1pWU1VoeUwycHFUWGRpWTFCQ2RrZ3hSMWg1VUhvdmQxY3pUa1EzWW5oQlEyNHZSa1JuV2tKNFNsWlhaVmRXYUhWc1lqQjZRekpSVmpKcVNrdHJURzlqZWxGRk0wSkxaVU52Y0hNMlRtVmhkMWhOV1VodmEzaFVZMHRKTm5FemJHdExjazVLVUhoV1VUaE5NazVCWkZGWk1qVndiRTB3T1RKVGFrZHRiMDFtV25WbFlXSlJNMWg0YkRCc1kwNVhjbnBUYVROblVWTjVha01yVW14d1MyRktXRmhSV1ZkbWJsRkVaRGhFV1ROR1VEWlpZVnAyYWtkR2NVMUZTR1ExY1ZCM2VVTlZObkJPTVZwNlF5dDFWR3BIZEZORk5WcEhiazFpUVhWRGRtZHpjbU51ZFhWMmNHOVphVlZLTUhRMk5sZ3lLMmRCZUd4Qk16Y3hjbkpoUVdoVlYzUXZhVkk1TVVGS2FERmtVRWw0ZWtkdlVVRlNZVE5LV2t3emVqVjZTazF3TjFSdFJGZzBPVm9yT1hCNFNpdEJVbXMxU0hWWWJWcFBhMVY1VmxkTU1TdHlkVXRrTVdoa01FOTZhMmhOTUhnMWVrVmpTbHB6YlZjd0sweFdkRXBYV1UweVpESkJkRWwyWWxOM1YyMUpTME01VVRsc1VuRjZWVk5OS3prNEwzbEZLM2xzYzNoVVpHZEliMUkxZVhKTWRUbHNObE5pVkdOYVV6ZFlVWE41TTB0blJ6bE1iMFJCVmxkMVIzcHFaRFJpVTBoWGVHVkpXWEY1VkVaWGR6TmFWMDlwVTNoYVlXbHNkWFJUUkVvMGVXdHBVV3QwZVVSUU16SkllR2xXYlcxUmJHaHlSMmR6ZG5SdE1WbHpZV0pZVEhOalkxbDFNRXB0VmxCWFF6Sk5RWFZpZDBsUVJUZFJWMkZ3UWxka01qVjZVbXBqZFhRMk1HUjRjRnBrU1hFM1lWb3pZV3hSVkU5SlJFUkdkMU0yTlU1VGJFeHlZMUpVYUZnMVJISmhNRVl5Y0RSUFltUnNSVW93VFhwT1ZWZzBkV3RESzNwNFNFTjFSR1ZxVUhkNmFtdGlVR0ZxU1RGVE1rUmhMMVpRTUhOUWIwcEQuMXNRWGZiNDlvbUJhLW9HYV9lbGpfMTF0WDdHVmJzMzVZb3ZkVEE0eHkweHR5azVWanBLS2VfbW5XSjZkeTdlZEV0SDNfQ3dTTkRYX3dvMFNGRWdEQUE=
_eventId: submit

由于是直接表单提交的而不是用 js 发的请求,可以直接在上面的 html 代码里根据 name 找到各个字段源于哪里。

以上五个字段都是可以得到的。

同时注意到 POST 中携带了若干 Cookies:

JSESSIONID=808D31AC4F6F45E8EA8D8F05E2C76C74.cas30;
_pv0=KI+EtIxj2UDs/BXMv0vdZmI5APblUZJyrVMXRSekKrN5+bfckHbJmzPKaku45uKVc02qmwhWAisV1fsVG3B6NScgp8RNOGmd3vjBGTnPy10/bkYnbNF7sxpkzKqxHuHG9iPSu9gAGf3p1fvnIE5iQQBicDhdYg6cDD/7J45Dzfr1K4UnL4KN7Ph79o3hPCz8++7ECbXgABbOJU5SdCX5KSPkfPP9GcBqwzN1UYUu8Z6/6MGW12okSx4GO7J/j13A0E++BcNpeBdnSGAmI4Pdq4grl92whh8GkCGUzec84fdGx+hXn/w/uggLohZVKxGHkqrwKFYkT0uW2+hR7SZUQTWHH5oVcNqjKxJNUD5a2br0DdM9ethXox+DaTYD5bgeyidNqpgfDRDHTvvGda3r3r5CrwnYP/pmPXBslnllk5U=

也就是说,为了模拟发送 POST 请求,我们要拿到五个表单字段内容、两个 Cookie。

JSESSIONID

可以看到在最初载入登录页面的时候,就带上了一个 Set-Cookie

JSESSIONID=4736615D840602A1B50AF6D6298C3A1E.cas32;
  Path=/cas;
  HttpOnly

JSESSIONID 是 Servlet 容器(tomcat,jetty)用来记录用户 session 的 Cookie。看起来就是它了。

getPubKey

观察发现,页面在加载之后就向 v2/getPubKey 发送了请求,可以得到这样的返回数据:

{
  "modulus":"991d141f56c671c9c55d31c35ae55901faaeea8dd4be5b2e6d7e38274e9feada58a4c47a554dae2adb71a6618d42a9d6be817f9aa8f5eb8de53acc56be4e2473",
  "exponent":"10001"
}

这两个应该是生成密钥对的参数,具体原理尚不清楚。总之根据此可以用上面的代码算出相同的 encrypedPwd

同时这个请求还在请求头中返回了以下 Set-Cookie

_pv0=gd1g%2FkuF8nKPObhdqygtrVhcdleX1lJEhbjuFFpqPt42mTUyjKN4icZFlgsrcVJMQYzcP7QluP4NPDUstVZpiyzxnMC%2Bxvj5HZhcKUttP1DgSNgAolMMgwEJD6GMxvV%2FrgffR0SW3i5PDd%2B3p8kGGv57d3Fs0uEVuYa6rtKZpAFoL5k5b%2B8XYtrOqMiLr3RnKT26xLiBv70imOLmSrLn2qPtepFj3D9xxSU5cwzlKTk%2BUSXYowGZlcNo%2BvBsL9dOIy%2FvlI6x6A%2BdqY4RyQ6Ft9KGUnxpZgkSTsSMm86yevTh9YSnzVY%2F8eoz1HA2CsS6Vw6%2FKTSjavRHjVguJY7R7r5CsWa0gVtpF7IT1E1%2FE8qYp5VxDbYLIEVPs85xkjKRbyqg%2F7WkQS2OnV%2F02mCGWAoRtxKs7qD%2BNLSD8eL1aFA%3D;
  Domain=cas.hnu.edu.cn; Path=/

于是 _pv0 这个 Cookie 的问题解决了。

密码加密

这个密码加密算法用了相同目录下的 security.js 代码,是个源于 1998 年的 js 库,但是具体看不懂。

这段注释里面说了作者,可以去官网看到这个代码的介绍:

/*
 * RSA, a suite of routines for performing RSA public-key computations in JavaScript.
 * Copyright 1998-2005 David Shapiro.
 * Dave Shapiro
 * dave@ohdave.com 
 * changed by Fuchun, 2010-05-06
 * fcrpg2005@gmail.com
 */

代码作者并没有给出对应的 Python 版本,如何在 Python 脚本里实现呢?有几种解决方案(从难到易):

这个具体实现挺奇怪的,把用户的密码倒过来再加密,有点迷惑。

发送 POST 请求

经过测试,最简请求只需要发两个 Cookie、五个 form-data。

curl 'http://cas.hnu.edu.cn/cas/login' \
  -H 'Cookie: JSESSIONID=2001762369FF0645ED3EC64382613117.im32; _pv0=wgfGE0rwBEr3341hmwwtiojDjR30dBqfRxrBe9kPlurTSU%2B4u%2FlWwZzHtzbY1I2OeeLAsHJfU7ftg%2BuIcZF8hMs3QsayPvTVi%2FUb55RTnLmJjgjp%2B6vti%2BlQj9iXJP6uu0hsUs8Oc2LVNbOiuq6Ezr4iyLHxTr%2BPFc2pxeunZ2XTKIMcKLpnoMcRuSyuG5TLTmUzJYG55A7m6Xqrn0Vu8KjOXlwpadDhi01JZxaHxAC6Ge02Zcc%2BCB3Rv8%2BAiQwE6VLu%2Fvmm%2B3URsxCOpD2FoE2c2xUVjuHVcdbpgkpLUaNahQM3sKBgmG7fr1RyYH7dIvi%2BNqWuNcC3O5KcsoIwGhxo216CQ6c0yrBqqWqfPq%2BTxmlNZyUoXJ%2B5DUgsnB0bYpB7pL7wxtGOGkn4F%2BCv44yDn8%2BUb5oHwQyyZM24%2BeY%3D' \
  --data-raw 'username=202108060109&password=84b38bf84fbdcdf46acd06410fc194de10f896679fb3d2fab1e8e1b725d8828a3610df5afe4ffb8ae31744969b1d5d855a5844d8454bb052fafb15af5f930dd5&authcode=&execution=f77fe468-1dc5-4fdb-b883-f881ed207045_ZXlKaGJHY2lPaUpJVXpVeE1pSjkuUWtFMlJFc3ZlakZCY0ZRM1FYVmFWakZCYW1zdlFXcERlVlp1ZEhWTGNreFBSM0Z5WTFsMkwzbFpkVWRUWmtKak1FUjFVblpaTjJ0UVpVNUljRTlMWkZGdGNFNVhlREo0V21oNVZFY3JLMkVyU0c1dE0zQkZiVEE1TDBaaGVrSXlaMWxrUms1cVdUTXZLMmxhT0hoWVdYWlpWa2Q0TDBWWFJEWm9kRXN3T0VoTmMzbG5jRll3Y1d3dmNEaEZNMlZHYUd0aFZURlNibk5wZGpacWRqWnBhV3BtT1RKa1REZDNlVzQ1WVhBelJrbzVWSFF2U25WYVpERlhiMVZ0TjA5R2JXSm1aVVZDTHpOSU5tdzBUMDVTYm1OV1ZFeHZjbVJZZVhZNWVFeEJaRU4yY2tOdFZHSndNbU5CV0VOckwxWmpTVWxDVGk5RVFXdHJkWFl3Tkc1WmMyNWtUR3N6SzJwbVFtazRhRkoxTW5vMFZtOU5PRkJ4ZFZkYU9WTkZNM1JZVlhKM0syTlRNRkpGTVVzNFVWTnFjbXhUTkV4Vk1HRmtkWEJGWWxwaVpuRnNlVzFMYmtKa01rUldabFZGWVhodUwxaElhVE5WY1VkRVQybGhWVzhyZVZRM1ZYVjNZeXRwYlZwRU1IUjZjRE53V1ZaeE1VbGxMMmRIT1N0dlYyOVdhVVZPV0c1b2RHeHBNM05qZGtSTFJsbDBNSFoyWTJGR2JqVnlWazlWUW10TFpuUm9RME5yYldwbmMyVTRjbVJ2VkhoeFJqUkJhR1JqUjJoWlJDdG5URTVRUlhOUVJrTlVVM3BtY0VaNmExUlBja00xV21sS056RjRWR1ZHTDAxU1kwZGtTRTVtWldoR1JGb3JNWGR0YWxwd1QwcFBOelJ3VjFCNVNFYzFibVl6THpaTk1rdzBkSEJ5TW1KS05ETkhZVnBQU0ZsbmFTdFJhM3BEVUhGbU1VSkxNMk00ZW1WdGRrWlpia1E1ZDBSWVQwUkZiVVozSzNGeWFGTm9lbTVXWkd3dmJUTkhOMWRrV1RSd0szRnNUaXQ2UlRoQ1VHcDFWeTlwUzJ0MlFXUktOVzF5ZVV0b00zSnNORkozVGtOTmRtbFllVEpESzBJM1dWVlFSelp0VDJzNVoycDZPVFJKUm5nNE1tc3pUR2hHYUhkNVVIZHFTa1J1VDJSU05ucHVWVFZ4VmtSRmFXVjRSVk5MYnpOU2JGRnJiV05aTmxBMFdUUlhhbTVEZFZKVmJqWXhaM1IyUjBSdFdrWTVjRzE0UkU5TWJrcE5ORzFsT0M5NWRVRmFVV1pZU25adFREUnZkalZYWXpaWWRXOVNSVmhWUnpNMkwxaEJiM3AzWWk5clJtSTFSWFZ6YVZKMmVuSmlkbFpCWWs0cmJYYzNhbkpvVEdkVFVqUlBSWGxNY1c1WE1uRXdlVFZoV0VOQmRFVlRLellyVmpOd2FWSk9lR3hqWVRKSWIzZEdSVEZ3UkVKdE5FUXpRVUpGUW1WVGJtMUVSR3BWU2xWU1NXMUNaVGwyWlVwVGNFVnFOR2dyTUVaUVdVSmtiMk40Y0ZSdFlXbDJWREJPUkVORlJtTmxibWhUVFVOWVYzRmpUaXR6YUVObVRqRm9RWEU1V25jMFdHTlhkalJWY1haVGEyUnNURzFUWlZCRWVuSnVia1YyY1VOaU1tODRMMDl0V0ZSSGJHOTFVR3B6ZFZSQlpqZG5PVVJNY0hKcFFqWlpVbVVyTm5VMk4ybEJkWFpqUkc0MGJrZ3pTRkEyVWswelZXdEJlSGxHTVRsd1ZIRkxVbVkzUlhaRlExTlRWVzk0YjNCQk15OUthVVV4VldKWFNFWTNMMmx2Y0hkSGVFSkJORUpLZEdWaFlrbDNaSEpOTlM4dmIySmtRazEyUW5CWmJYUTJUeXR3V1ZBNWJHNDNSbWxpUVc1T1ExaFJaRTFGTms1TFZERm9jV2RwVkRCeWJIRk1RWEl2VVhNd1ZsWkZZVlpRSzJaM2FGVndhbGxtTUc4eFVtMXJjRFV2Y1ZJNVkxVTRRWEZHVW1WWFlWcDNhRmMwZUVaaFVUZGpVVzE1U1dwT04wNVNUR1I2YlRkaVRrNWFTVGRJVTI1QkwzaE9WWFJVT1RWWGNGWmxkSFpWWlUwMGJreFNUa3hXZDJWd1VubElTRFE0VldwVFFWTnJiMVF2UWxkbFJuSkdURTFYV1VZcmRYWjJWV2RvYTB4a01uVlVha013ZWtSRE1qRTRZMDVrV1ZjMGFWQnZMMVI2TVhCTWExbHNUWFZOUWxCbWEzaGhOMWxuUWsxbWQxWm5lUzlPUzBWNEwycGFSR3RqVHpBdlNXazVVM3BwTjNKbGFWRjNjV0p2U2xscFVHNVNaV05MZVhsSmNGSlFSazFKVkhwd04wbEJkR1l6UjA1RGNtMVVZMkY0ZEZGRVlYRjViblpIWlVOTWFWZzNXbWxpYjNnd2QyVnVWVFZoUzBkb2RETXJLMjVGVDBNM2NXNXNkVWQwUVRKT1MwUklWM1JES3poNFFuUjBZbWg0WVV0TWRqTnlUMFZvVVcwclpGVnNRV1JWWXpJd1dWaDFNekp3Ym0xTE4zbHdhVGxqTUdvelRXbFVRMWhXY25KR2NrOU9jMjlpZURGVkx5OVVhWE00Vm5Od1kwNDRVRUZDUm5vNWNXZ3ZaRTE1TmxoSGIxa3JkVlJMYUVOVlpXTlZTRkZNYVd4UGQwSnguSGtGOV9LMFVHZ2gwcG9yUTY2bzJFWkRlQlFTNi1pczVMcE8wTHFrVzJLMFhVY2JlanM0ZXdYRzBmUlFEVHlfV1RaOVg1aUQzNjNSaEdsa3BOTVB6OGc%3D&_eventId=submit'

这些内容好像是没有过期时间的。

POST 请求之后

login 发完 POST 请求之后,返回的并不是 200 而是 302,附带了若干 Set-Cookie

CASPRIVACY=;
  Max-Age=0; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Path=/cas/; Secure
_pf0=yNlillHGbCY1t2AdPozsaozqFnWzSZo59I6ehvg2XsA=;
  Max-Age=86400; Expires=Thu, 12-May-2022 11:45:18 GMT; Domain=cas.hnu.edu.cn; Path=/; HttpOnly
_pm0=;
  Max-Age=0; Expires=Thu, 01-Jan-1970 00:00:10 GMT; Domain=cas.hnu.edu.cn; Path=/
_pc0=vUfpP1MGYarocLQgQgfAFRsxO2HyBGTzEcg+mGwf+7yewseSDEDYQSGEUIdSz+Oi;
  Domain=cas.hnu.edu.cn; Path=/; HttpOnly
iPlanetDirectoryPro=ZOMenIjL39gmMSKoZAowUqSSd9jimVIhN6ocARhLTD+zeFQPTWyBUSEZdxwS70lbXbLDi0WKL6LTsydgeCfSwJ+aeSPdfSfwj58sBE0eqSCboGqrMtXlz99xb+ia8dcBe2+vpFHGBjdT4we+WJyPs8+rzHOPMOf108uyV7M8zM9scARGxhoHOnADRQx33iKJPEP4qmlkBlw3A9MV9XFtO0MNV/iSn6bcSNujfUHFOa/xZ9fRsd5fMXFnFScsBEGgUxQKDpU02thXp7efiPP55ymb3c04Xova6g2M6qBl+xi1PfH2pfW5wyujRko4/istmw1dZBPEcB7tlaCw9SeXKgXs/viGuhI/JbeuzN7/kIBPtTm2bl48FamXdXohYElt;
  Domain=cas.hnu.edu.cn; Path=/

猜测这一串操作就是为了设置这些 Cookie,进入主页面即根据检测到的 Cookie 合法性判断用户。可以验证两次对 https://pt.hnu.edu.cn/ 的访问,区别只在于有没有 wisportalId 这个 Cookie。

这段重定向非常麻烦也很复杂,理论上我们可以在 Python 里写个 while 循环,但是 requests 库的请求可以帮我们自动处理重定向。一通重定向之后,prepareRequest 对象中的 Cookie 就存储了我们想要的 wisportalId 等 Cookie。

>>> postLogin.history
[<Response [302]>, <Response [301]>, <Response [302]>, <Response [302]>, <Response [302]>]

登录成功后的请求

对图书馆数据的请求:

curl 'https://pt.hnu.edu.cn/api/hndxTsg/getBorrowBooks/detail?_=1652352553585' \
  -H 'Accept: application/json, text/plain, */*' \
  -H 'Accept-Language: en,zh-CN;q=0.9,zh;q=0.8' \
  -H 'Connection: keep-alive' \
  -H 'Cookie: route=a09caef73920180c25245f140604ba0b; wisportalId=WHYE4cN7TZAjB-AVLMHEjx56pELpBe5SbIrFjbMR' \
  -H 'Referer: https://pt.hnu.edu.cn/personal-center' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36' \
  -H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  --compressed

经过试验,可以发现只要带上 wisportalId 这一个 Cookie 就可以了。

curl 'https://pt.hnu.edu.cn/api/hndxTsg/getBorrowBooks/detail' \
  -H 'Cookie: wisportalId=WHYE4cN7TZAjB-AVLMHEjx56pELpBe5SbIrFjbMR'

所以这个 Cookie 是我们的最终成果,也是登录之后的关键。

返回结果是 json 格式,就是我图书馆借的所有书的数据。

{
    "data":[
        {
            "attachment":"0",
            "barcode":"XS0028533",
            "loancount":"0",
            "loandate":"2022-04-25",
            "returndate":"2022-05-25",
            "title":"未来简史:从智人到神人:a brief history of tomorrow"
        }
    ],
    "error_code":0,
    "message":"请求成功!"
}

实测这个 wisportalId 是有一个有效时间的,过一点时间之后就会失效。应该是有续租的借口,不过目前未见系统对登录频率有什么严格的限制,所以暂时懒得研究了,失效了重新登录就好。

Python 代码实现

Requests 库发请求

requests 库应该无需介绍了。

最后的实现有一点不足,就是最后的一波重定向是 requests 帮我处理了的,表现为最后一个请求发出之后结果没有 301/302 而是 200,然后 postLogin 对象(类型为 prepareRequest)里的 Cookie 改变了。然而 prepareRequest 这个类型没有提取 Cookie 的成员函数……我们只能将其作为一个 Header 拿出来,直接放到之后请求的 Header 里。

或许改写成 while 循环自己模拟这个重定向会更好。

BeautifulSoup 解析 HTML

一开始想尝试用 xml.dom.minidom 这样的解析器尝试构建 DOM,结果想起来 HTML 并不是标准的 XML 实现,并且湖大这个网页说不定也不是规范的 HTML……

所以直接用 BeautifulSoup 了,这是一个可以用于解析 HTML 文件的库,常用于网页爬虫。

需要注意的一个点是,在获取 name 为 execution 的 input 元素的 value 时,name 这个关键字和 find() 成员函数的形参名冲突,所以必须用 attrs 的写法:

# 错误的写法:找不到元素
execution = pageSoup.find(name='execution').get('value')
# 正确的写法:
execution = pageSoup.find(attrs={'name': 'execution'}).get('value') # name 关键字和形参列表冲突

Pyexecjs 调用 js

一开始尝试用 PyV8,结果安装起来非常麻烦,折腾了半天也没搞定,后来就发现了 Pyexecjs 这个替代品。需要 Node.js 环境才能使用。

在这里我们可以用「预编译」js 文件的方式来调用,只要在 js 里封装好了 getResult() 函数,Python 里可以 call:

with open('security.js', 'r', encoding='utf-8') as jsfile:
    jstext = jsfile.read()
ctx = execjs.compile(jstext)
encrypedPwd = ctx.call('getResult', exponent, modulus, password)

完整代码实现

这是还没封装的简陋代码,只把拿到的数据输出了。

username = '202108060109'
password = 'YourPassWord'

import requests
from bs4 import BeautifulSoup
import execjs

getPage = requests.get('http://cas.hnu.edu.cn/cas/login')
print('GET http://cas.hnu.edu.cn/cas/login : ' + str(getPage.status_code))
pageSoup = BeautifulSoup(getPage.text, 'html.parser')
execution = pageSoup.find(attrs={'name': 'execution'}).get('value') # name 关键字和形参列表冲突
myJSESSIONID = getPage.cookies['JSESSIONID']
print('Cookie JSESSIONID: ' + myJSESSIONID)

myCookies = {'JSESSIONID': myJSESSIONID}
getPubKey = requests.get('http://cas.hnu.edu.cn/cas/v2/getPubKey', cookies=myCookies)
print('GET http://cas.hnu.edu.cn/cas/v2/getPubKey : ' + str(getPubKey.status_code))
modulus = getPubKey.json()['modulus']
exponent = getPubKey.json()['exponent']
_pv0 = getPubKey.cookies['_pv0']
print('Cookie _pv0: ' + _pv0)

with open('security.js', 'r', encoding='utf-8') as jsfile:
    jstext = jsfile.read()
ctx = execjs.compile(jstext)
encrypedPwd = ctx.call('getResult', exponent, modulus, password)
print(encrypedPwd)

authcode = ''
_eventId = 'submit'

myUA = r'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.3 Safari/605.1.15'

postData = {'username': username, 'password': encrypedPwd, 'authcode': authcode, 'execution': execution, '_eventId': _eventId}
postCookies = {'JSESSIONID': myJSESSIONID, '_pv0': _pv0}
postHeaders = {'User-Agent': myUA}
postLogin = requests.post('http://cas.hnu.edu.cn/cas/login', data=postData, cookies=postCookies, headers=postHeaders)
print('POST http://cas.hnu.edu.cn/cas/login : ' + str(postLogin.status_code))
print(postLogin.history)

cookies = postLogin.request.headers['Cookie']
headers = {'User-Agent': myUA, 'Cookie': cookies}
getBooks = requests.get('https://pt.hnu.edu.cn/api/hndxTsg/getBorrowBooks/detail', headers=headers)
print(getBooks.text)

完结撒花 🎉

下一步