The BookLamp website:
- What is the Book Genome Project?
- Why would this be useful? Aren’t there social engines for this sort of thing?
- What is our biggest criticism of our own tools? (Not enough books)
- So, you just want to make it faster to find new books, right? That’s your goal? (hint: NOPE)
- Where do you get your books?
- Is BookLamp free?
- Can I donate to BookLamp?
- How can I help BookLamp grow?
- I see you guys are a .ORG, not a .COM? On purpose?
- Why do you not have XYZ author, or XYZ book!?!? Do you hate them?
- What makes “Genres” and “StoryDNA” different?
- What is the difference between “StoryDNA” and BISAC codes?
- How can a book have 100% of one StoryDNA ingredient and more than 0% for any other?
- Will you be working with teachers or libraries?
- Why do some of your book covers look better than others?
- Why do you have typos in your book descriptions?
- How is BookLamp connected to CanGoogleHearMe.com (CGHM)?
- Book DNA? Sounds like alchemy. How does BookLamp work, exactly?
- Is BookLamp doing anything that raises copyright concerns?
- What is Motion, Pacing, etc… ?
- Hey, wait # book XYZ is anything BUT high Motion!
- Does this work with Nonfiction too?
- What happens when BookLamp’s engine goes insane?
- Isn’t this applying Science to an Art? Doesn’t this make you a terrible person?
BookLamp as a company:
- Who runs BookLamp? Are you an evil multinational corporation?
- How can I help BookLamp grow? (worth asking twice)
- How can I contact BookLamp?
A: Founded in 2003, the Book Genome Project was created to identify, track, measure, and study the multitude of features that make up a book using computational tools. Begun independently by students at the University of Idaho in 2003, by 2008 the team included researchers and programmers from Stanford University, Florida State University, and Boise State University. Over time, partnerships were formed with commercial publishers, and the project became self-sustaining by 2010. It’s included collaborators from locations as diverse as New York, Idaho, California, and the United Kingdom.
Much like Pandora.com was created to provide a practical outlet for the Music Genome Project, we created BookLamp.org to allow readers and writers to use the tools that we’ve developed over the years. BookLamp is the public face and home of the Book Genome Project, so please check it out and let us know what you think.
Q: Why is this useful? Aren’t there social engines for this sort of thing?
A: There are a number powerful and very cool social recommendation engines for discovery of books, for sure. Goodreads.com, LibraryThing.com, Shelfari.com, among others. Amazon.com uses social feedback and buying behavior in their engine. They all have their strengths. One of the biggest concerns with social recommendation engines, though, is their natural pyramid structure and the “popular” bias they inherit from their dependence on public awareness to work. Fundamentally, marketing dollars often drive public awareness. Awareness drives what’s recommended and discoverable. As a consequence the bestselling titles will typically dominate the tops of the suggestions at social recommendation sites, simply because more people know they exist.
A good example of this can be seen by looking at LibraryThing’s recommendations for The Da Vinci Code. Two of the top books being recommended are Memoirs of a Geisha and Harry Potter and the Goblet of Fire. It took us a while to figure out why those were there, because we considered them unusual comparable titles. LibraryThing’s system is normally pretty good. After a while, we realized that The Da Vinci Code, Memoirs of a Geisha, and Harry Potter and the Goblet of Fire were all released into the movie theaters within a couple of months of each other. Consequently, the marketing dollars from the movie promotion got a lot of people to go out and buy all three books at the same time, and so LibraryThing sees them side by side on a lot of shelves. It makes sense, so you can’t fault it; it’s a limited example. But it’s not a stretch to point out that social awareness of a book drives its chances of showing up on a social recommendation site, and marketing dollars drive social awareness. Books people don’t know stay at the bottom.
The objective nature of what we do makes the BookLamp engine impervious to marketing dollars, or author popularity. We like to say, “We’re as likely to find you Richard Bachman as we are Stephen King.” For a long time, Richard Bachman – a pen name for Stephen King – sold a lot fewer copies than the Stephen King books, because no one knew they were the same author. Now, everyone knows, but back then, Richard Bachman almost certainly wouldn’t have been easily discoverable on the social recommendations engines that exist today, any more than a human could recommend a book that they didn’t know existed. BookLamp’s approach has its own strengths and weaknesses. One of the strengths is that we treat every book equally, regardless of popularity or marketing budget, and so can pull from the long tail.
Q: What is our biggest criticism of our own tools? (Not enough books)
A: Not having enough books. It’s the biggest “criticism” that we have about our own tools today. We only have books in the system that are provided to us by our participating publishers; if the book is published by a publisher we don’t work with yet, it won’t be in the database. Attracting publishers to the project to help grow our book database is one of the primary reasons BookLamp.org exists. Please help us do that. If you know a publisher, encourage them to work with us, or introduce us at firstname.lastname@example.org.
Q: So, you just want to make it faster to find a new book, right? (HINT: Nope)
A: Nope. I imagine that most book suggestion tools on the web aim for that, but certainly not us. We’re reformed. In the early days, we used to think that’s what we were trying to do. Make it so you can find a better book, faster. But after a while we realized that we didn’t like that, that it didn’t really fit how we interacted with our books as readers at all. In fact, we realized that we LOVE taking time to browse books at libraries and bookstores. When we get stuck at an airport, it’s the bookstore that we’re drawn to. The last thing we would want is to walk into a library, and have a person standing there that shoves a book into your hands and says, “You can go home without looking any more. This is the best book for you.”
In fact, that guy would be very annoying. And creepy. Probably wearing shades and looking like the Man in Black. We’d still want to go past that creepy, annoying guy and look at all the other books that are still waiting for us. We realized that – at least for us – the pain of finding a good book was NOT in the browsing around the shelves, reading covers, and discovering things. That was something we looked forward to. The pain was when we didn’t enjoy the books we discovered.
So, we threw away the concept of “better, faster”. It never really sat well, anyway. The painful part of finding good books, we decided, was not the point where you go around and find five books that you want to take home and try. Instead, the painful part is when you actually get home, and find that you only enjoy reading two out of those five books. Then it’s painful, because you already know from your careful search at the library that those other three SOUND interesting; you want to read them, and like them, and be engrossed in their stories. And even with that much wanting to like them, somehow they don’t hold you. That’s painful. You’ve just discovered and lost these three wonderful worlds that you were excited about visiting. Ugg. So our goal became to emulate the excitement of discovery you experience every time you walk into a musty library or an old bookstore – only we want our bookshelves to be stocked with only books you’ll want to read from cover to cover once you take it home with you. We want shelves not defined by genre, but personalized to you, so that you can glide painlessly from shelf to shelf as you follow whatever trail you find interesting.
Faster is not our goal. Enjoyment is our goal. We want you to glimpse the same world of possibility when you arrive at our site as you would walking in the front door of the largest, oldest, most mystical library you’ve ever seen. We want you to feel that wonder of worlds to be uncovered. If we could hire an old man that looks like a wizard to sit at the front desk, hand you a lit candle and a treasure map when you came in the door, we would. But there is no desk. There is no library. Just our website, and our attempt to connect you to that comfortable world while sitting at your computer wanting to get lost in the world of a book.
A: We try to work with publishers directly to make their books discoverable in our system. We’re working to connect with more and more publishers so that we can make their books discoverable on BookLamp.org, but that’s an ongoing task. At launch, BookLamp.org will include a large array of titles, ranging from Random House, Inc to Kensington Publishing Corp.
Q: Is BookLamp free?
A: “There is no charge for the awesomeness. OR the attractiveness.”
BookLamp.org is a non-commercial service intended to demonstrate the cool potential of our tools, our engine, and to help facilitate the discovery of good books. It’s one application of a technology that we use in many different ways, and is here because we love books, and like people who also love books. We’re data nerds interested in changing the way the world finds content.
Q: Can I donate to BookLamp?
A: We get this a lot (thank you!). While it’s one thing to put in our own time, energy and money into the project, we’ve never been comfortable accepting donations. If, however, you know a publisher that should have their books in our system, we will absolutely accept an introduction. Have them contact us at email@example.com, or e-mail us and cc them, and make the introduction yourself.
Q: How can I help BookLamp Grow? I really want to see this project succeed.
A: Our goal with BookLamp.org in this first year or so is to attract more publishers to the project. We can’t make their books discoverable on BookLamp without their books in our system. In other words, we need the publishing industry to get together and decide to “make us real” – put their weight behind what we do and see if we can build something cool.
You can help by introducing us to publishers. If you know a publisher that should be working with us, please make the introduction, or e-mail them and suggest they work with us. Our goal is to be one of the most connected companies in the publishing industry. Have them e-mail us at firstname.lastname@example.orgThat’s where you can help. Spread the word, and let’s go out together and make this project into something really, really nifty.
Q: I see you guys are a .ORG, not a .COM? On purpose?
A: Yes, it actually IS on purpose. In fact, we actually own BookLamp.com, as well as .net, .me, and a bunch of other dots. If you go to BookLamp.com right now, you’ll be redirected back over to the .ORG domain. Why? Call it a matter of personal pride. As long as we’re only working with a few publishers in the industry – even the great publishers we already have with us – the Book Genome Project hasn’t reached its full potential. When we’ve got enough traction to take ourselves seriously as a company in the “real world” of grown-ups – well, we’ll shrug off the .ORG name, celebrate the transition with a pizza party, and carry on as BookLamp.com.
Until we feel we’ve reached a point that we feel worthy of being taken seriously on a national and international stage, we’ll stay with the .ORG umbrella. What we don’t know is exactly how long it will take to get to that point. We’ll see.
Q: Why do you not have XYZ author, or XYZ book!?!? Do you hate them? This makes me angry!
A: The content used to produce the data in our system is provided by our publisher relationships. If there is a book you’d like to see in the system, look up the publisher and fire them an email suggesting they talk with us. You can also send the request to us directly – if we receive enough such requests, then we will have something to show publishers when we talk with them about working with us.
Q: What makes “Genres” and “StoryDNA” different?
A: “StoryDNA” – also referred to as “Thematic Ingredients” – is a database we at BookLamp continue to develop, which contains objectively-derived thematic information about each book in our corpus. The, “Genres,” are pulled from an industry-standard database, and are provided to us by the publisher. StoryDNA does not line up to genre labels, but there are certain StoryDNA ingredients that appear more in some genres than others. For example, the thematic ingredients that have to do with magic and dragons appear more in Fantasy novels than anywhere else. Horses and dry terrain tend to show up in Westerns, etc.
We actually discourage the use of genre labels, in concept, even though people are used to using them to filter and discover books. In our perspective, if the book you like has a certain mix of Horses, Guns, and Romance, you should give another book with a similar make-up a chance, even if its genre is different than you’d normally read. Expand your horizons.
A: BISAC codes are an industry standard way of classifying books, and they’re very useful. They’re like a beefed up genre classification. However, StoryDNA is different in that we’re measuring both the presence of an ingredient, and also the amount of that ingredient. A book with 90% Vampires is a very different book than one with 5% Vampires, but both would probably receive the same label in the BISAC classification. This is why we also tend to refer to our StoryDNA as Thematic Ingredients, because both the “What” and the “How Much” are equally important to understanding what’s in the book.
Additionally, every element in our StoryDNA structure is measured in every book, even if the resulting score is zero. So while BISAC is assigned by a human and books normally only have two or three BISAC labels at a time, all StoryDNA is applied objectively across all books in our corpus. In other words, a book is defined not only by, say, the presence of Vampires, but also the absence of Urban Environments.
Q: How can a book have 100% of one StoryDNA ingredient and more than 0% for any other ingredient?
A: The percentage scores are representative of how much of that topic a book has compared to the other books in our database. A value of 85% means that the book has more of that Thematic Ingredient than 85% of the books in our system. A measure of 100% means it is one of the highest scoring books in our system for that particular thematic ingredient. Each ingredient is measured this way independently.
Q: Will you be working with teachers or librarians?
A: Yes, in fact, we already are. We have teachers and librarians in our advisory group and in our research division. But, hey, if you are a teacher or a librarian and you see something here that strikes you as exciting or as something you’d like to bring into your classroom or your stacks, please send us an email at email@example.com. We’d love to talk with you!
Q: Why do some of your book covers look better than others?
A: Most of the book covers we’re using at the moment are provided to us by the publishers already in our coalition. Unfortunately, a lot of times the book covers in the electronic realm are not as pretty as the covers found on the bookshelves. That will change over time.
As a result, we’re also using covers provided by OpenLibrary.org, which has a fantastic collection of book covers available for use, and tend to have nice looking covers. They have a great mission, and we encourage you to check them out.
Q: Why are there typos in your book descriptions?:
A: Like book covers, our Publisher Descriptions are provided to us in a standard ONIX record. We don’t edit or change those descriptions at all, at the moment, so we don’t claim any typos you might find. We might engage in some data cleaning in the future, but not now. In the mean time, it’s worth noting that BookLamp’s engine doesn’t use external metadata in determining the suggestions; the genre and descriptive data is just there to help us humans figure out what we’re looking at.
Q: How is BookLamp connected to CanGoogleHearMe.com (CGHM)?
A: CanGoogleHearMe.com is a website created by Aaron Stanton early in the life-cycle of Novel Projects, Inc. CGHM received international attention – including being covered by news agencies like the BBC, ABC, PBS, WIRED, PCWorld Magazine, City News International and others, reaching all across the globe – G’day mate! A team of passionate programmers worked several years with Aaron to develop BookLamp. It was a beginning for the project, but it took many years of hard work to actually build the tools that power BookLamp’s suggestions.
A: It’s complicated and technical, and – to be straight forward – quite extensive. It’s the product of years of development by some very good minds (excluding the author of this FAQ). To go into it in detail would take a great deal of time. Instead, I’ll direct you over to Bookgenome.com. Also, I’ll copy and paste a description of the Book Genome structure that was posted by our founder in our forums a little while ago:
“The concept of the “Book Genome Project” appeals to a good number of people in concept, but can become confusing when you try to actually define what you’re trying to measure. What is the DNA and RNA of a book? How can you extract it meaningfully and accurately at scale? Unlike music, books are long and complicated, making it difficult to measure the literary equivalent of “beats per minute”, or the type of instrument used, as in the Music Genome Project run by Pandora.com. Unlike with music, where you know most of what there is to know about the song within the first 80 seconds, a book rarely contains the fundamental essence of the story in the first chapter alone, and so the rise and fall of more subtle elements on a chapter-by-chapter basis is important to track.
So what is BookLamp’s genome structure? I’ll briefly touch on one branch of the multi-part genome structure, what we call StoryDNA. Fundamentally, a story is made up of many components, but a good portion of it has to do with setting and content. Where does a story take place, and what are the elements that physically act in that story? The BookLamp algorithms, when it looks at a book and breaks it down for comparisons to other books, divides the genome into two components of StoryDNA, known as Setting and Actors. Story Setting is defined by the environment that a story appears in, such as that it takes place in the forests, city, or on the sea. Story Actors, on the other hand, are the elements that act in the settings. An example of a Story Setting is the amount of “Forests & Trees” that appears in a book, vs “City Streets & Urban Environment” – two very different Story Settings. An example of a Story Actor would be “Medieval Weapons & Armor” – a physical instance that acts in the environment. To put this in perspective, a book with 30% Forests & Trees, along with 10% Medieval Weapons would be a very different story than 30% City Streets & Urban Environment, and the same 10% Medieval Weapons.
Because the software measures on a scene-by-scene basis, we know the exact make-up of the 36th chapter of a book, for example, as well as the first chapter. To give you a perspective of the depth of the system, we currently measure and store a little over 30,000 points of data for every book we analyze, giving us a database with literally hundreds of millions of elements across the corpus at our relaunch, with full expectation that it’ll grow into the billions within a few months. We don’t talk much about the academic history of the project, but the research and development required to engineer and build this system over the last few years has been significant, pulling in skill sets from engineers and researchers from universities all around the world. This level of detail, extrapolated across a database of tens of thousands of books, provides a very interesting picture of the world of literature.
While the value of StoryDNA is great, it is only one branch of our Book Genome structure. The Language and Character DNA are also very important elements of what make up a book, and can be critical to how a reader responds to a book. No story elements are worthwhile unless the medium used to delivery them (the language) has a minimum level of appeal, as well. After all, despite Twilight and Romeo and Juliet sharing many storyline similarities, I doubt that floods of Stephenie Meyer fans ever instinctively ran out and bought copies of the Shakespearean plays. The language differences naturally represented tremendous barriers to enjoyment for some readers.
It’s also worth noting that only a small portion of what makes a book appeal to a reader can be found in the objective analysis of the genome structure; our goal as a project and website is to measure what can be measured, and place that data in the hands of the user to help them discover the books they’re interested in.”
A: Nope. BookLamp works directly with book publishers to obtain content. Our goal is to make it easy for readers to discover a publisher’s books, and make the front, mid, and backlist accessible in a new and powerful way. Most publishers like that idea and seem enthusiastic to work with us.
If you know a publisher that should be working with us, feel free to make the introduction at firstname.lastname@example.org You can read more about how you can help the project in the “How can I help” question farther down.
Q: What is Motion, Pacing, etc… ?
A: Motion, Density, Description, Dialog and Pacing are stylistic metrics or terms developed to help make the complicated under-workings of our analysis more understandable. They are not the complete picture of what makes up a book’s writing style, nor a complete picture of what BookLamp tracks in a book, but they do measure elements that a person can easily understand.
- Motion: Motion refers to the level of physical motion in a scene or book.
- Description: Description refers to the level of descriptive language that the author uses in his or her writing.
- Pacing: Pacing refers to the layout of the text on the page. A scene with high Pacing will have characteristics that quickly move the reader’s eye down the page.
- Density: Density refers to the complexity of the text. Text with high Density will take longer to read than a text of equal length with low density.
- Dialog: Dialog refers to the amount of spoken text between two or more characters in a scene.
It’s also worth pointing out that we capitalize these because we’re defining them inside the context of what BookLamp measures. Whether or not our definition of Density matches yours is a hard thing to answer, but what we can say is that we measure our version of Density consistently across all our books. So if you like a book with a certain level of Density, we’ll try to find you one with a similar level of Density.
Q: Wait, what? # book XYZ is anything BUT high Motion!
A: After a great deal of testing and training against human feedback, we think the tools are pretty good. If you come across a book that you’ve read, and you don’t think the metrics are correct, it’s most likely a matter of definition. We’re likely using a different definition of what “Pacing” might mean, for example. Pacing might mean something different for you than for me. The only assurance we can make in that case is that we measure whatever “it” is consistently, so if you hand us a book with a certain level of Motion, we’ll find you other books with the same level of Motion. In which case whether our scales match each other won’t make a difference, as both books will be similar when you actually sit down and read them.
That said, we don’t claim to be perfect. It’s also possible you found one of our zingers, which do exist. Sometimes we get odd scores – you can see some of the more amusing ones we’ve noticed a little farther in this FAQ, under “When BookLamp Goes Insane”. And if it is just a weird score on our part, we’re always working to get better.
Q: Does this work with Nonfiction too?
A: Yes, though our initial focus was on Fiction since a keyword search does not work particularly well on books that are not defined by a single theme. But our techniques are perfectly suited to Nonfiction, as well, and you will notice that both Fiction and Nonfiction titles are found in our database.
Q: What happens when BookLamp’s engine goes insane?
A: We try hard to get everything right, but we don’t claim perfection, either as people or as an engine. As with any technical system, there are always rare cases where the engine makes a mistake, perhaps labels something incorrectly, or connects books in unexpected ways. During testing, we’ve run into some very amusing examples.
- Why does Toe-up Socks for Everybody – what appears to be a book on knitting – score as having a high level of “Crippling Injury / Accidents / Emergency Aid”? I’ve never read the book, but sure seems odd.
- The Body Has a Mind of Its Own scores high for Descriptions of Physical Intimacy, but it seems to be a clinical book about how the body works. With that one, at least the connecting thread exists – I can see how that might happen – but it’s still funny.
- The Road, by Cormac McCarthy gives you some good matches, followed by a children’s book called The Toymaker.
- UPDATE: Turns out we should have trusted ourselves more on The Toymaker. We assumed that since The Toymaker is classified as a children’s book, The Road is too dark to be a good match. Well, after reading it, it turns out that The Toymaker has got to be one of the creepiest, darkest children’s books ever. It shares a lot more in common with The Road than I would have expected at first.
These sorts of issues do and will exist. We try our best to minimize them, and learn from them when we find them, and fix them in future updates. That said, it also helps if you view them with a sense of humor, send us a note about it when you find one, and then remind yourself that the reason that one example stands out is because the engine has done such a good job on all the titles around it. It’s easy sometimes to overlook how really hard a problem the engine undertakes every time you perform a search. The fact that, out of 20,000 books, Dan Simmon’s Hyperion appears in the top suggestions for Peter Hamilton’s Pandora’s Star – one of the best recommendations I can possibly imagine having read both books – absolutely blows me away.
It’s easy to forget the scale and complexity of what the system does, because it does it really fast, and generally does it very well. Mostly.
SubQ: Sure, that’s nice, but seriously – why do two books that don’t appear to be a good match sometimes show up together?
A: Every once in a while the BookLamp engine tosses out what we call a “zinger,” a book that just doesn’t seem to make sense as a comparable title for the search book. From an algorithmic perspective it makes perfect sense: something in that zinger is similar to the the search book, but they’re probably not features that are particularly visible to a human. A book might match because it has the same amount of Forests, for example, but be off on the big themes, like Vampires. Mathematically accurate, but from a human perspective it’s still a zinger. Most zingers pop up when the system runs out of close matches. In other words, the system is only as good as the corpus of books we have in our database and if there just isn’t another close match in the corpus, the system will still find the best match that it can. This is partly why some books seem to perform better than others. Sometimes this can lead to a wonderfully serendipitous suggestions. Other times it can lead you to think the engine has gone insane. As the BookLamp corpus gets bigger, there will be fewer zingers.
Q: Isn’t this applying Science to an Art? Doesn’t this make you a terrible person?
A: It is applying science to art, but we also find that people tend to assume that means something it actually doesn’t. BookLamp doesn’t “judge” a book. We don’t deal with quality, or good writing or bad writing. Instead, we simply measure things. For example, we measure whether a book is likely a 1st person perspective or a 3rd person perspective. If someone then tells us that they like 1st person titles, we’ll try to help them find more 1st person titles. But we don’t try to judge if being 1st person is “better” or “worse” than 3rd person. That’s not our game, and we’re not qualified or interested in doing so.
If you think about it, what we’re basically doing is championing the idea that the contents of a book – what the author actually wrote – should be the primary consideration when finding new books for readers. It shouldn’t matter that one book has a million dollar marketing budget, and the other is from a new author with no track record at all. If the content between the covers is a good match, that should be all that matters.
All of which can be summed up with the classic saying, “Don’t judge a book by its cover.”
Consequently, we are neutral in terms of whether a book is in the front, mid, or backlist, or written by a new author or an old one. The tools are as capable of finding you a book by Richard Bachman as it is at finding you a book by Stephen King. This is part of the reason our suggestions will tend to be different from socially driven suggestion engines; we have no social or popularity bias. As a new author, you have as much authority – as much of a voice – inside of BookLamp as any other author in the world.
A: Not yet! BookLamp is an independent website operated by Novel Projects, Inc., the company founded in 2003 to start the initial development around the Book Genome Project. The project is managed and developed by a team based out of Boise, ID, though we have team members working in New York and California as well. That’s our rag-tag band of adventurers. And our Analysis Monkey, of course, who we keep in a cage in the back of the room to assign book scores for us (we’re kidding – there is no monkey. Or cage.).
Q: How can I help BookLamp Grow? I really want to see this project succeed.
A: Yes, we put this question in here twice on purpose. We really want your help. Our goal with BookLamp.org in this first year or so is to attract more publishers to the project. We can’t make their books discoverable on BookLamp without their books in our system. In other words, we need the publishing industry to get together and decide to “make us real” – put their weight behind what we do and see if we can build something cool.
You can help by introducing us to publishers. If you know a publisher that should be working with us, please make the introduction, or e-mail them and suggest they work with us. Our goal is to be one of the most connected companies in the publishing industry. That’s where you can help. Spread the word, and let’s go out together and make this project into something really, really nifty. Have them contact us at email@example.com.
Q: How do I contact BookLamp?