Skip to content

Transparency in the Age of Big Data


I started thinking about this yesterday, after posting about the Kindle Unlimited scammers. Or I guess I mean I started thinking about it again, because after all I’m generally far too interested in transparency, privacy, accountability, designing systems…you know: fun 4:00am thoughts.

Yep, it’s actually that early in the morning here. It’s not at all unusual for me to be awake right now. (By which I mean at this time of day–I guess “right now” only happens once per utterance/typing/whatever.) I do this weird napping thing (aka polyphasic sleep, and my approach to it has changed over the last year or so, and maybe I should blog about that sometime or other), so I’m awake at all sorts of strange-to-the-uninitiated times.

On the other hand? In the interest of transparency, maybe I should tell you: I really ought to be writing new fiction right now. That’s what my middle-of-the-night awake-time is for! And yet I’m not. More on this in a bit.

So, yesterday and transparency. I suggested that Amazon might not be totally accurate (or even close) when they track the number of pages people read after borrowing Kindle Unlimited ebooks. Which means the payments they make to authors/publishers are quite possibly not totally accurate. Actually, given that the scamming techniques I blogged about do in fact work, I’m quite certain that Amazon is being less than totally upfront about the calculations that go into Kindle Unlimited payments. (The fact that those who download borrowed ebooks and transfer them to an offline reading device, or convert them to a different format before reading, cannot possibly be having their “pages read” counted by Amazon is true and all, but probably not all that significant in a statistical sense.) The only saving grace here is that Amazon is, basing this on their past behavior, extremely unlikely to detail the ins and outs of their systems to the public at large. Unless a whistleblower arises?

It’s sort of funny, to me, to think about the reactions various folks would have if the truth did come out.

There are those who would point out how they were right all along that writers shouldn’t have granted Amazon an exclusive license to sell their ebooks, even for 90 days at a time, especially since writers participating in KU never actually know from month to month how much Amazon is going to choose to pay them. It’s not just that writers don’t know how many pages of their own material will be read–they also don’t know how many total pages, across all of Kindle Unlimited, will be read. Nor do they know in advance just how large the total pool of money Amazon is going to be dividing among them will be. But other folks will point out that, in fact, they’ve made more money since Kindle Unlimited came along. Or that a lot of folks who used to buy indie-published fiction have (mostly?) switched to Kindle Unlimited, so their non-KU sales have gone down (unless there are other, less nefarious but more troubling reasons for a drop in sales). Or whatever. Generally, no matter what happens, people will spin it so it turns out they were right all along.

Maybe, in this case…full transparency wouldn’t really help anyone. Except, I guess, people who want to become irate, or seem irate, and blog or otherwise rant about this sort of thing. After all, the underlying question for writers is still this: is Kindle Unlimited currently the best choice? Or not?

Full disclosure: I don’t really care. I figure these things come and go. But I do care about privacy…so I don’t let my Kindle talk directly to Amazon. Ever.

I care about your privacy too! I don’t use Google Analytics, or any other system reporting to a central database, to keep track of this site’s users. (I do use Piwik, but it creates a local database only. Unless it’s been hacked.) I encrypt all communication with this site. It’s not that I think the data about who’s reading what page at any given time is especially important–it’s that I figure default behavior should be “do not track people” or at least “do not go out of your way to upload people’s information to databases that are out of your control without their explicit understanding and consent”…and yes, I realize that even this level of concern means I’m an outlier here. (Incidentally, blocking JavaScript or using a plugin like Ghostery will also block the tracker I use. And Google Analytics, and many others. OTOH, web servers usually log most of the same information by default.)

It’s just: Amazon gets a whole lot of information about people from their shopping (and reading) habits. I don’t know how innocuous all that will turn out to be, over time. I wouldn’t bet against huge chunks of that data becoming public, or nearly, at some point in the future. Combine all that with other data we all shed daily, and…welcome to the panopticon. So does it matter at all that I don’t like the idea that Amazon tracks the number of pages read? Nope! Not even a little. It’s trivial. And yet, I still don’t like it and go out of my way to hide at least that little bit of information from them.

I’d go on about big-picture issues revolving around data, supposedly “anonymized” or not, transparency, and privacy–but I already wrote that book. I think there’s plenty of reason for concern here, but I probably don’t need to write it all again in a blog post. If you’d like, you can go buy it for 99 cents at Amazon. In the spirit of transparency, though, I should probably tell you it’s easy to find “pirated” copies. And no, that doesn’t bother me–hell, the book’s damn near a piracy manual to begin with. Not actually my goal when I wrote it, just as I didn’t intend for it to be (primarily) a guide to hacking Wi-Fi networks, phone calls, or computer systems in general…but I felt it was necessary to describe various threats in order to talk about defending against them. Well, that and it was fun to include the info. Want more transparency? That 99-cent price is pretty good, for sure, and I just lowered it a day or two ago. But if you hold off another week or two, it’ll be free at lots of non-Amazon sites. Well, those and the pirate sites. So go have fun with it, if you want to. Bonus: it tells you all about the time I went to jail for trying to protect students’ privacy. My privacy too, and I was also protecting their finances and such, and I was slapped down for it…in a sleazy “we have our own police department and all that annoying news coverage was six months ago, so we’re pretty sure we can get away with it” kind of way. But did it really hurt me, financially or, uh, reputationally? Nope. Didn’t. It just focused my attention a bit, is all.

But! This post is not that book. So. I’m going to try something new. I’ve been thinking that it’s easier to improve systems and processes that are actively tracked. Or at least it’s easier to prove that improvement happened, right? And transparency helps with that, as privately-held data can be ignored fairly easily. Like this: Amazon no doubt knows damn well that KU pages read are only kinda-sorta tracked, right? But saying so wouldn’t help them, or readers, or necessarily writers either. In fact, if the “fix” would be for them to design a system that gathers even more data on their customers? I’d just as soon skip the whole thing. No matter how they’ve designed their system, some group or other will complain about it. And the more we know about it, the easier it’ll be for people to figure out how to hack the system. So no matter how badly it’s set up, I hope they keep it to themselves.

So in this case, for this project, I don’t want to keep the data private. “What project, damnit?” I imagine you demanding at this point. Well, writing new fiction, of course! That thing I’m not actually doing right now. (Told you I’d come back to that.)

So, for the next 30 days, I’m going to post daily fiction totals. Which means I’ll need to write a lot of blog posts too. Not sure what they’ll be about, but if inspiration fails I’ll just do a lot of book reviews. I’ve been wanting to do that for quite a while.

I’m not sure what data, other than new words written, I’ll be posting. We’ll see how it goes. Do I even want to identify the books/stories I’ll be working on? Maybe. I guess I can give them code names. That’ll be cool.

Anyway, I could in principle track this on my own. But I’d probably only do it for a day or two. Telling you folks about it makes it more likely that I’ll bother with the follow-through.

So. Is transparency generally a good idea? I’d say: it depends. This time, for me, probably so. Often, I’d say it doesn’t help at all.

Thoughts, whilst you have fun out there in the world?


Published inMy FictionRandom Rants

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *