-
Notifications
You must be signed in to change notification settings - Fork 82
Added kdocs for DataSchema and DataRowSchema #1775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,58 @@ | ||
| package org.jetbrains.kotlinx.dataframe.annotations | ||
|
|
||
| import org.jetbrains.kotlinx.dataframe.api.cast | ||
| import org.jetbrains.kotlinx.dataframe.api.convertTo | ||
|
|
||
| /** | ||
| * Annotation to generate extension properties API for a given declaration, according to its properties. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. each line in this KDoc starts with two spaces instead of one |
||
| * Annotated declaration should be non-local and non-private interface or a class. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *Annotated declaration should be a non-local and non-private interface or class. |
||
| * The aim here is to provide convenient syntax for working with a dataframe instance right after reading from it CSV, JSON, Databases, Arrow, etc. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *a convenient syntax
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *databases |
||
| * After `val df = DataFrame.read*` operation, `df` is a source of truth for the DataSchema. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *After the
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *the source of truth
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *data schema (the concept) |
||
| * One way to look at it, DataSchema "tells" the compiler what's already there. It doesn't affect reading. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *One way to look at it is that the data schema... |
||
| * See the list below of code generation methods to simplify the process of getting what we call an initial dataschema. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, maybe "Check out the 'See Also' section?" Depending on how it's rendered, it's not really a list |
||
| * Given the initial schema of the data you read, the compiler plugin will provide a typed result for most operations. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. link to compiler plugin |
||
| * | ||
| * Example: | ||
| * ``` | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please annotate any code samples with the right file format to get highlights, anywhere you use Markdown ;P |
||
| * @DataSchema | ||
| * data class Group( | ||
| * val id: String, | ||
| * val participants: List<Person> | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. inconsistent trailing commas |
||
| * ) | ||
| * | ||
| * @DataSchema | ||
| * data class Person( | ||
| * val name: Name, | ||
| * val age: Int, | ||
| * val city: String? | ||
| * ) | ||
| * | ||
| * @DataSchema | ||
| * data class Name( | ||
| * val firstName: String, | ||
| * val lastName: String, | ||
| * ) | ||
| * | ||
| * fun main() { | ||
| * val url = "https://raw.githubusercontent.com/Kotlin/dataframe/refs/heads/master/data/participants.json" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. inconsistent spacing. I might recommend using the |
||
| * val df = DataFrame.readJson(url).cast<Group>() | ||
| * val i: Int = df.id[0] // properties style access to columns and values | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can come up with a better name than |
||
| * | ||
| * val df1 = df.asGroupBy { participants }.aggregate { | ||
| * count() into "groupSize" | ||
| * distinct { city } into "cities" | ||
| * } | ||
| * | ||
| * // now compiler plugin uses previous knowledge of `Group` combined with its understanding of aggregate operation | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. *the compiler plugin |
||
| * // to help you access new columns | ||
| * val l: List<String> = df1.cities[0] | ||
| * } | ||
| * ``` | ||
| * | ||
| * @see [org.jetbrains.kotlinx.dataframe.api.generateDataClasses] | ||
| * @see [org.jetbrains.kotlinx.dataframe.api.generateInterfaces] | ||
| * @see [org.jetbrains.kotlinx.dataframe.DataFrame.cast] | ||
| * @see [org.jetbrains.kotlinx.dataframe.DataFrame.convertTo] | ||
| */ | ||
| @Target(AnnotationTarget.CLASS) | ||
| public annotation class DataSchema(val isOpen: Boolean = true) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably explain what |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,24 @@ | ||
| package org.jetbrains.kotlinx.dataframe.api | ||
|
|
||
| import org.jetbrains.kotlinx.dataframe.DataFrame | ||
| import org.jetbrains.kotlinx.dataframe.annotations.DataSchema | ||
|
|
||
| /** | ||
| * Marker interface that's automatically added to classes annotated with [DataSchema] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... why? (because it can help with .append() etc.) Also, we should specify how this is added (with the compiler plugin) and that it's added only to data classes, not to interfaces (right?) and why. "Added" -> "Added as supertype" |
||
| */ | ||
| public interface DataRowSchema | ||
|
|
||
| /** | ||
| * Example: | ||
| * ``` | ||
| * @DataSchema | ||
| * data class Person(val name: String, val age: Int) | ||
| * | ||
| * fun main() { | ||
| * val df = dataFrameOf(Person("Alice", 30), Person("Bob", 25)) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. single space indent ;P Same story, try @ExcludeFromSources
private interface Sample {
// to make it compile without the compiler plugin
fun <T> dataFrameOf(vararg rows: T): DataFrame<T> = TODO()
// SampleStart
@DataSchema
data class Person(val name: String, val age: Int)
fun main() {
val df: DataFrame<Person> = dataFrameOf(Person("Alice", 30), Person("Bob", 25))
}
// SampleEnd
}
/**
* Example:
* @sample [Sample]
*/
public inline fun <reified T : DataRowSchema> dataFrameOf(vararg rows: T): DataFrame<T> =
rows.asIterable().toDataFrame()
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. append could also do with a small sample like this :) |
||
| * } | ||
| * ``` | ||
| */ | ||
| public inline fun <reified T : DataRowSchema> dataFrameOf(vararg rows: T): DataFrame<T> = | ||
| rows.asIterable().toDataFrame() | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd start a bit more generally and introductorily before jumping into exactly what it does.
So: "This annotation marks an interface or data class as 'data schema'" (link to https://kotlin.github.io/dataframe/schemas.html). Then continue with "It's used to generate extension properties, etc.". Gives a bit more context to this key DataFrame component :)