There is a lot of discussion in many jurisdictions about whether it is fair use/dealing to use copyrighted works available on the internet for training an AI. An example today in the UK is here.
My understanding is that fair use only applies if one legally acquired the original from which the copy is made. There are strict laws on unauthorised access to web sites, and this authorisation is generally provided to humans by way of terms and conditions and implied or given to machines by the robots.txt file.
If one provided authorisation to humans only in the terms and conditions, and excluded any creative content in the robots.txt (as this place does) would the question of fair use be moot, in that the AI company never had authorisation to access the information in the first place so any fair use defence would fail?
To try and be more specific with an example, suppose Alice puts a creative work on the web with a Terms and Conditions focused of GDPR compliance that put no limits on who is allowed to access the site but does not grant a licence to use the content for AI training, and a robots.txt saying allow all, for SEO optimisation. Bob puts a similar creative work on the web with a Terms and Conditions that says something like "Licence to access this site is given only to natural persons over the age of 18" and a robots.txt saying disallow all. An AI company scrapes both works for AI training and claims that this is fair use/dealing. Is it possibly/likely that the fair use claim could be accepted for Alice but not for Bob?
In adding the above edit I note that the UK "Text and data mining for non-commercial research" exception specifically says "if they already have the right to read the work (that is, they have ‘lawful access’ to the work)". I guess the question could be reduced to is this an exception for other fair dealing situations, such as private study, criticism, review and reporting, search engine indexing and could that logic extend to the soon to decided on AI training?